mirror of
https://github.com/OneUptime/oneuptime.git
synced 2026-04-06 00:32:12 +02:00
feat(AlertGrouping): Remove outdated migration and implementation documents; add summary for Alert Grouping feature
- Deleted the detailed migration plan (5-Migration.md) and implementation plan (README.md) for Alert Grouping. - Introduced a new summary document (Summary.md) outlining key capabilities, data models, grouping types, and on-call policy resolution for the Alert Grouping feature.
This commit is contained in:
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@@ -1,606 +0,0 @@
|
||||
# API Design for Alert Grouping
|
||||
|
||||
## Overview
|
||||
|
||||
This document defines the REST API endpoints for Alert Grouping / Episodes functionality.
|
||||
|
||||
## Base URLs
|
||||
|
||||
All endpoints are prefixed with the project scope:
|
||||
|
||||
```
|
||||
/api/project/{projectId}/alert-episode
|
||||
/api/project/{projectId}/alert-grouping-rule
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Episodes API
|
||||
|
||||
### List Episodes
|
||||
|
||||
```http
|
||||
GET /api/project/{projectId}/alert-episode
|
||||
```
|
||||
|
||||
**Query Parameters:**
|
||||
|
||||
| Parameter | Type | Description |
|
||||
|-----------|------|-------------|
|
||||
| `currentAlertStateId` | ObjectID | Filter by state |
|
||||
| `alertSeverityId` | ObjectID | Filter by severity |
|
||||
| `groupingRuleId` | ObjectID | Filter by grouping rule |
|
||||
| `startedAt` | DateRange | Filter by start time |
|
||||
| `search` | string | Search in title/description |
|
||||
| `limit` | number | Results per page (default: 10) |
|
||||
| `skip` | number | Pagination offset |
|
||||
| `sort` | string | Sort field (default: `-lastActivityAt`) |
|
||||
|
||||
**Response:**
|
||||
|
||||
```json
|
||||
{
|
||||
"data": [
|
||||
{
|
||||
"_id": "episode-id-1",
|
||||
"episodeNumber": 42,
|
||||
"title": "Database Connectivity Issues",
|
||||
"description": "Multiple database connection failures",
|
||||
"currentAlertState": {
|
||||
"_id": "state-id",
|
||||
"name": "Active",
|
||||
"color": "#FF0000"
|
||||
},
|
||||
"alertSeverity": {
|
||||
"_id": "severity-id",
|
||||
"name": "Critical",
|
||||
"color": "#FF0000"
|
||||
},
|
||||
"alertCount": 15,
|
||||
"uniqueMonitorCount": 3,
|
||||
"startedAt": "2026-01-20T10:45:00Z",
|
||||
"lastActivityAt": "2026-01-20T10:57:00Z",
|
||||
"groupingRule": {
|
||||
"_id": "rule-id",
|
||||
"name": "Database alerts - 5min"
|
||||
}
|
||||
}
|
||||
],
|
||||
"count": 55,
|
||||
"skip": 0,
|
||||
"limit": 10
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Get Episode Details
|
||||
|
||||
```http
|
||||
GET /api/project/{projectId}/alert-episode/{episodeId}
|
||||
```
|
||||
|
||||
**Response:**
|
||||
|
||||
```json
|
||||
{
|
||||
"_id": "episode-id-1",
|
||||
"episodeNumber": 42,
|
||||
"title": "Database Connectivity Issues",
|
||||
"description": "Multiple database connection failures",
|
||||
"currentAlertState": {
|
||||
"_id": "state-id",
|
||||
"name": "Active",
|
||||
"color": "#FF0000"
|
||||
},
|
||||
"alertSeverity": {
|
||||
"_id": "severity-id",
|
||||
"name": "Critical",
|
||||
"color": "#FF0000"
|
||||
},
|
||||
"alertCount": 15,
|
||||
"uniqueMonitorCount": 3,
|
||||
"startedAt": "2026-01-20T10:45:00Z",
|
||||
"lastActivityAt": "2026-01-20T10:57:00Z",
|
||||
"acknowledgedAt": null,
|
||||
"resolvedAt": null,
|
||||
"groupingRule": {
|
||||
"_id": "rule-id",
|
||||
"name": "Database alerts - 5min"
|
||||
},
|
||||
"ownerUsers": [],
|
||||
"ownerTeams": [],
|
||||
"labels": [],
|
||||
"rootCause": null
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Create Episode (Manual)
|
||||
|
||||
```http
|
||||
POST /api/project/{projectId}/alert-episode
|
||||
```
|
||||
|
||||
**Request Body:**
|
||||
|
||||
```json
|
||||
{
|
||||
"title": "Custom Episode Title",
|
||||
"description": "Optional description"
|
||||
}
|
||||
```
|
||||
|
||||
**Response:** Created episode object
|
||||
|
||||
---
|
||||
|
||||
### Update Episode
|
||||
|
||||
```http
|
||||
PUT /api/project/{projectId}/alert-episode/{episodeId}
|
||||
```
|
||||
|
||||
**Request Body:**
|
||||
|
||||
```json
|
||||
{
|
||||
"title": "Updated Title",
|
||||
"description": "Updated description",
|
||||
"ownerUsers": ["user-id-1"],
|
||||
"ownerTeams": ["team-id-1"],
|
||||
"labels": ["label-id-1"],
|
||||
"rootCause": "Database connection pool exhausted"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Delete Episode
|
||||
|
||||
```http
|
||||
DELETE /api/project/{projectId}/alert-episode/{episodeId}
|
||||
```
|
||||
|
||||
Deleting an episode removes all member relationships but does NOT delete the alerts themselves. Alerts will have their `episodeId` set to null.
|
||||
|
||||
---
|
||||
|
||||
### Acknowledge Episode
|
||||
|
||||
```http
|
||||
POST /api/project/{projectId}/alert-episode/{episodeId}/acknowledge
|
||||
```
|
||||
|
||||
**Request Body:**
|
||||
|
||||
```json
|
||||
{
|
||||
"acknowledgeAlerts": true // Optional: also acknowledge all alerts
|
||||
}
|
||||
```
|
||||
|
||||
**Response:**
|
||||
|
||||
```json
|
||||
{
|
||||
"_id": "episode-id",
|
||||
"currentAlertState": {
|
||||
"_id": "acknowledged-state-id",
|
||||
"name": "Acknowledged"
|
||||
},
|
||||
"acknowledgedAt": "2026-01-20T11:00:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Resolve Episode
|
||||
|
||||
```http
|
||||
POST /api/project/{projectId}/alert-episode/{episodeId}/resolve
|
||||
```
|
||||
|
||||
**Request Body:**
|
||||
|
||||
```json
|
||||
{
|
||||
"rootCause": "Database server restarted",
|
||||
"resolveAlerts": true // Optional: also resolve all alerts
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Get Episode Alerts
|
||||
|
||||
```http
|
||||
GET /api/project/{projectId}/alert-episode/{episodeId}/alerts
|
||||
```
|
||||
|
||||
**Query Parameters:**
|
||||
|
||||
| Parameter | Type | Description |
|
||||
|-----------|------|-------------|
|
||||
| `limit` | number | Results per page |
|
||||
| `skip` | number | Pagination offset |
|
||||
| `sort` | string | Sort field |
|
||||
|
||||
**Response:**
|
||||
|
||||
```json
|
||||
{
|
||||
"data": [
|
||||
{
|
||||
"_id": "alert-id-1",
|
||||
"alertNumber": 127,
|
||||
"title": "MySQL connection pool exhausted",
|
||||
"currentAlertState": { ... },
|
||||
"alertSeverity": { ... },
|
||||
"monitor": { ... },
|
||||
"createdAt": "2026-01-20T10:57:00Z",
|
||||
"episodeMembership": {
|
||||
"addedBy": "rule",
|
||||
"addedAt": "2026-01-20T10:57:00Z",
|
||||
"groupingRule": { "_id": "rule-id", "name": "Database alerts" }
|
||||
}
|
||||
}
|
||||
],
|
||||
"count": 15,
|
||||
"skip": 0,
|
||||
"limit": 10
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Add Alert to Episode
|
||||
|
||||
```http
|
||||
POST /api/project/{projectId}/alert-episode/{episodeId}/add-alert
|
||||
```
|
||||
|
||||
**Request Body:**
|
||||
|
||||
```json
|
||||
{
|
||||
"alertId": "alert-id-to-add"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Remove Alert from Episode
|
||||
|
||||
```http
|
||||
POST /api/project/{projectId}/alert-episode/{episodeId}/remove-alert
|
||||
```
|
||||
|
||||
**Request Body:**
|
||||
|
||||
```json
|
||||
{
|
||||
"alertId": "alert-id-to-remove"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Merge Episodes
|
||||
|
||||
```http
|
||||
POST /api/project/{projectId}/alert-episode/merge
|
||||
```
|
||||
|
||||
**Request Body:**
|
||||
|
||||
```json
|
||||
{
|
||||
"targetEpisodeId": "episode-to-keep",
|
||||
"sourceEpisodeIds": ["episode-to-merge-1", "episode-to-merge-2"]
|
||||
}
|
||||
```
|
||||
|
||||
All alerts from source episodes are moved to the target episode. Source episodes are deleted.
|
||||
|
||||
---
|
||||
|
||||
### Split Episode
|
||||
|
||||
```http
|
||||
POST /api/project/{projectId}/alert-episode/{episodeId}/split
|
||||
```
|
||||
|
||||
**Request Body:**
|
||||
|
||||
```json
|
||||
{
|
||||
"alertIds": ["alert-id-1", "alert-id-2"],
|
||||
"newEpisodeTitle": "Split Episode"
|
||||
}
|
||||
```
|
||||
|
||||
Creates a new episode with the specified alerts removed from the original episode.
|
||||
|
||||
---
|
||||
|
||||
### Get Episode Timeline
|
||||
|
||||
```http
|
||||
GET /api/project/{projectId}/alert-episode/{episodeId}/timeline
|
||||
```
|
||||
|
||||
**Response:**
|
||||
|
||||
```json
|
||||
{
|
||||
"data": [
|
||||
{
|
||||
"type": "alert_added",
|
||||
"timestamp": "2026-01-20T10:57:00Z",
|
||||
"description": "Alert #127 added to episode",
|
||||
"alert": { "_id": "alert-id", "title": "MySQL connection pool exhausted" },
|
||||
"addedBy": "rule"
|
||||
},
|
||||
{
|
||||
"type": "state_change",
|
||||
"timestamp": "2026-01-20T10:50:00Z",
|
||||
"description": "Assigned to John Smith",
|
||||
"user": { "_id": "user-id", "name": "John Smith" }
|
||||
},
|
||||
{
|
||||
"type": "episode_created",
|
||||
"timestamp": "2026-01-20T10:45:00Z",
|
||||
"description": "Episode created with 3 initial alerts",
|
||||
"groupingRule": { "_id": "rule-id", "name": "Database alerts - 5min" }
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Grouping Rules API
|
||||
|
||||
### List Grouping Rules
|
||||
|
||||
```http
|
||||
GET /api/project/{projectId}/alert-grouping-rule
|
||||
```
|
||||
|
||||
**Response:**
|
||||
|
||||
```json
|
||||
{
|
||||
"data": [
|
||||
{
|
||||
"_id": "rule-id-1",
|
||||
"name": "Database Alerts - 5 minute window",
|
||||
"description": "Groups database-related alerts within 5 minutes",
|
||||
"isEnabled": true,
|
||||
"priority": 1,
|
||||
"matchCriteria": {
|
||||
"labelIds": ["database-label-id"],
|
||||
"titlePattern": ".*(connection|database|mysql|postgres).*"
|
||||
},
|
||||
"groupingConfig": {
|
||||
"type": "time_window",
|
||||
"timeWindowMinutes": 5
|
||||
},
|
||||
"episodeConfig": {
|
||||
"titleTemplate": "{{severity}} - Database Issues",
|
||||
"autoResolveWhenEmpty": true,
|
||||
"breakAfterMinutesInactive": 60
|
||||
}
|
||||
}
|
||||
],
|
||||
"count": 3
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Get Grouping Rule
|
||||
|
||||
```http
|
||||
GET /api/project/{projectId}/alert-grouping-rule/{ruleId}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Create Grouping Rule
|
||||
|
||||
```http
|
||||
POST /api/project/{projectId}/alert-grouping-rule
|
||||
```
|
||||
|
||||
**Request Body:**
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "Database Alerts - 5 minute window",
|
||||
"description": "Groups database-related alerts within 5 minutes",
|
||||
"isEnabled": true,
|
||||
"priority": 1,
|
||||
"matchCriteria": {
|
||||
"severityIds": ["critical-id", "high-id"],
|
||||
"labelIds": ["database-label-id"],
|
||||
"titlePattern": ".*(connection|database).*"
|
||||
},
|
||||
"groupingConfig": {
|
||||
"type": "time_window",
|
||||
"timeWindowMinutes": 5
|
||||
},
|
||||
"episodeConfig": {
|
||||
"titleTemplate": "{{severity}} - Database Issues",
|
||||
"autoResolveWhenEmpty": true,
|
||||
"breakAfterMinutesInactive": 60
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Update Grouping Rule
|
||||
|
||||
```http
|
||||
PUT /api/project/{projectId}/alert-grouping-rule/{ruleId}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Delete Grouping Rule
|
||||
|
||||
```http
|
||||
DELETE /api/project/{projectId}/alert-grouping-rule/{ruleId}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Enable/Disable Grouping Rule
|
||||
|
||||
```http
|
||||
POST /api/project/{projectId}/alert-grouping-rule/{ruleId}/enable
|
||||
POST /api/project/{projectId}/alert-grouping-rule/{ruleId}/disable
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Test Grouping Rule
|
||||
|
||||
```http
|
||||
POST /api/project/{projectId}/alert-grouping-rule/{ruleId}/test
|
||||
```
|
||||
|
||||
**Request Body:**
|
||||
|
||||
```json
|
||||
{
|
||||
"alertIds": ["alert-id-1", "alert-id-2", "alert-id-3"]
|
||||
}
|
||||
```
|
||||
|
||||
**Response:**
|
||||
|
||||
```json
|
||||
{
|
||||
"matchedAlerts": [
|
||||
{ "_id": "alert-id-1", "title": "MySQL timeout", "wouldMatch": true },
|
||||
{ "_id": "alert-id-2", "title": "API error", "wouldMatch": false },
|
||||
{ "_id": "alert-id-3", "title": "PostgreSQL error", "wouldMatch": true }
|
||||
],
|
||||
"wouldCreateEpisodes": 1,
|
||||
"groupingPreview": [
|
||||
{
|
||||
"episodeTitle": "Critical - Database Issues",
|
||||
"alerts": ["alert-id-1", "alert-id-3"]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Existing Alert API Changes
|
||||
|
||||
### Alert Response Enhancement
|
||||
|
||||
The existing Alert response will include episode information:
|
||||
|
||||
```json
|
||||
{
|
||||
"_id": "alert-id",
|
||||
"alertNumber": 127,
|
||||
"title": "MySQL connection pool exhausted",
|
||||
"episode": {
|
||||
"_id": "episode-id",
|
||||
"episodeNumber": 42,
|
||||
"title": "Database Connectivity Issues"
|
||||
},
|
||||
"fingerprint": "abc123...",
|
||||
"duplicateCount": 5
|
||||
}
|
||||
```
|
||||
|
||||
### Filter Alerts by Episode
|
||||
|
||||
```http
|
||||
GET /api/project/{projectId}/alert?episodeId={episodeId}
|
||||
```
|
||||
|
||||
### Get Ungrouped Alerts
|
||||
|
||||
```http
|
||||
GET /api/project/{projectId}/alert?episodeId=null
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## API Implementation Notes
|
||||
|
||||
### Permissions
|
||||
|
||||
| Endpoint | Required Permission |
|
||||
|----------|---------------------|
|
||||
| GET episodes | `ProjectMember` |
|
||||
| Create/Update/Delete episodes | `ProjectAdmin` |
|
||||
| Acknowledge/Resolve episodes | `ProjectMember` |
|
||||
| GET grouping rules | `ProjectMember` |
|
||||
| Create/Update/Delete grouping rules | `ProjectAdmin` |
|
||||
|
||||
### Error Responses
|
||||
|
||||
```json
|
||||
{
|
||||
"error": {
|
||||
"code": "EPISODE_NOT_FOUND",
|
||||
"message": "Episode with ID xxx not found"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Common error codes:
|
||||
- `EPISODE_NOT_FOUND` - Episode doesn't exist
|
||||
- `ALERT_NOT_FOUND` - Alert doesn't exist
|
||||
- `ALERT_ALREADY_IN_EPISODE` - Alert is already part of an episode
|
||||
- `CANNOT_MERGE_RESOLVED` - Cannot merge resolved episodes
|
||||
- `INVALID_GROUPING_CONFIG` - Invalid grouping rule configuration
|
||||
|
||||
### Rate Limiting
|
||||
|
||||
Standard API rate limits apply. Batch operations (merge, bulk add) count as multiple operations.
|
||||
|
||||
---
|
||||
|
||||
## Implementation Checklist
|
||||
|
||||
### Episode API
|
||||
- [ ] GET /alert-episode (list)
|
||||
- [ ] GET /alert-episode/:id (details)
|
||||
- [ ] POST /alert-episode (create)
|
||||
- [ ] PUT /alert-episode/:id (update)
|
||||
- [ ] DELETE /alert-episode/:id (delete)
|
||||
- [ ] POST /alert-episode/:id/acknowledge
|
||||
- [ ] POST /alert-episode/:id/resolve
|
||||
- [ ] GET /alert-episode/:id/alerts
|
||||
- [ ] POST /alert-episode/:id/add-alert
|
||||
- [ ] POST /alert-episode/:id/remove-alert
|
||||
- [ ] POST /alert-episode/merge
|
||||
- [ ] POST /alert-episode/:id/split
|
||||
- [ ] GET /alert-episode/:id/timeline
|
||||
|
||||
### Grouping Rule API
|
||||
- [ ] GET /alert-grouping-rule (list)
|
||||
- [ ] GET /alert-grouping-rule/:id (details)
|
||||
- [ ] POST /alert-grouping-rule (create)
|
||||
- [ ] PUT /alert-grouping-rule/:id (update)
|
||||
- [ ] DELETE /alert-grouping-rule/:id (delete)
|
||||
- [ ] POST /alert-grouping-rule/:id/enable
|
||||
- [ ] POST /alert-grouping-rule/:id/disable
|
||||
- [ ] POST /alert-grouping-rule/:id/test
|
||||
|
||||
### Alert API Updates
|
||||
- [ ] Add episode field to alert response
|
||||
- [ ] Add episodeId filter to alert list
|
||||
- [ ] Add fingerprint field to alert response
|
||||
@@ -1,669 +0,0 @@
|
||||
# UI Implementation for Alert Grouping
|
||||
|
||||
## Overview
|
||||
|
||||
This document details the frontend components and pages required for Alert Grouping / Episodes functionality.
|
||||
|
||||
## Navigation Structure
|
||||
|
||||
```
|
||||
Dashboard
|
||||
├── Alerts
|
||||
│ ├── All Alerts (existing)
|
||||
│ └── Episodes (NEW)
|
||||
└── Settings
|
||||
├── Alerts
|
||||
│ ├── Alert States (existing)
|
||||
│ ├── Alert Severities (existing)
|
||||
│ └── Grouping Rules (NEW)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Pages to Create
|
||||
|
||||
### 1. Episodes List Page
|
||||
|
||||
**File Location:** `/Dashboard/src/Pages/Alerts/Episodes.tsx`
|
||||
|
||||
**Route:** `/dashboard/:projectId/alerts/episodes`
|
||||
|
||||
**Wireframe:**
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────────────┐
|
||||
│ Alerts > Episodes [+ Create Episode] │
|
||||
├─────────────────────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌────────┬──────────────┬────────────┬───────┐ ┌─────────────────────────────┐ │
|
||||
│ │ Active │ Acknowledged │ Resolved │ All │ │ 🔍 Search episodes... │ │
|
||||
│ │ (5) │ (2) │ (48) │ (55) │ └─────────────────────────────┘ │
|
||||
│ └────────┴──────────────┴────────────┴───────┘ │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ ● EP-42 Database Connectivity Issues 🔴 Critical │ │
|
||||
│ │ ┌─────────────────────────────────────────────────────────────────────────┐ │ │
|
||||
│ │ │ 15 alerts │ 3 monitors │ Started 10 min ago │ Last activity: 2 min ago │ │ │
|
||||
│ │ └─────────────────────────────────────────────────────────────────────────┘ │ │
|
||||
│ │ │ │
|
||||
│ │ Preview: │ │
|
||||
│ │ • Alert #123: MySQL connection timeout on web-server-1 │ │
|
||||
│ │ • Alert #124: MySQL connection timeout on web-server-2 │ │
|
||||
│ │ • Alert #125: PostgreSQL connection refused on api-server │ │
|
||||
│ │ └── +12 more alerts │ │
|
||||
│ │ │ │
|
||||
│ │ Rule: "Group database alerts within 5 min" [Acknowledge] [Resolve] │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ ● EP-41 High CPU Utilization 🟠 High │ │
|
||||
│ │ ... │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ [1] [2] [3] ... [Next →] │
|
||||
└─────────────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**Implementation:**
|
||||
|
||||
```typescript
|
||||
// /Dashboard/src/Pages/Alerts/Episodes.tsx
|
||||
|
||||
import React, { FunctionComponent, ReactElement } from 'react';
|
||||
import PageComponentProps from '../PageComponentProps';
|
||||
import ModelTable from 'Common/UI/Components/ModelTable/ModelTable';
|
||||
import AlertEpisode from 'Common/Models/DatabaseModels/AlertEpisode';
|
||||
import FieldType from 'Common/UI/Components/Types/FieldType';
|
||||
import Navigation from 'Common/UI/Utils/Navigation';
|
||||
import DashboardNavigation from '../../Utils/Navigation';
|
||||
import AlertSeverity from 'Common/Models/DatabaseModels/AlertSeverity';
|
||||
import AlertState from 'Common/Models/DatabaseModels/AlertState';
|
||||
import Pill from 'Common/UI/Components/Pill/Pill';
|
||||
import { Black } from 'Common/Types/BrandColors';
|
||||
|
||||
const EpisodesPage: FunctionComponent<PageComponentProps> = (
|
||||
props: PageComponentProps
|
||||
): ReactElement => {
|
||||
return (
|
||||
<ModelTable<AlertEpisode>
|
||||
modelType={AlertEpisode}
|
||||
id="episodes-table"
|
||||
isDeleteable={true}
|
||||
isEditable={false}
|
||||
isCreateable={true}
|
||||
isViewable={true}
|
||||
name="Episodes"
|
||||
query={{
|
||||
projectId: DashboardNavigation.getProjectId()!,
|
||||
}}
|
||||
cardProps={{
|
||||
title: 'Episodes',
|
||||
description:
|
||||
'Episodes group related alerts together for easier management.',
|
||||
}}
|
||||
selectMoreFields={{
|
||||
alertCount: true,
|
||||
uniqueMonitorCount: true,
|
||||
startedAt: true,
|
||||
lastActivityAt: true,
|
||||
}}
|
||||
columns={[
|
||||
{
|
||||
field: {
|
||||
episodeNumber: true,
|
||||
},
|
||||
title: 'Episode',
|
||||
type: FieldType.Text,
|
||||
getElement: (item: AlertEpisode): ReactElement => {
|
||||
return (
|
||||
<span className="font-medium">
|
||||
EP-{item.episodeNumber}
|
||||
</span>
|
||||
);
|
||||
},
|
||||
},
|
||||
{
|
||||
field: {
|
||||
title: true,
|
||||
},
|
||||
title: 'Title',
|
||||
type: FieldType.Text,
|
||||
},
|
||||
{
|
||||
field: {
|
||||
currentAlertState: {
|
||||
name: true,
|
||||
color: true,
|
||||
},
|
||||
},
|
||||
title: 'State',
|
||||
type: FieldType.Entity,
|
||||
getElement: (item: AlertEpisode): ReactElement => {
|
||||
if (!item.currentAlertState) {
|
||||
return <></>;
|
||||
}
|
||||
return (
|
||||
<Pill
|
||||
text={item.currentAlertState.name || ''}
|
||||
color={item.currentAlertState.color || Black}
|
||||
/>
|
||||
);
|
||||
},
|
||||
},
|
||||
{
|
||||
field: {
|
||||
alertSeverity: {
|
||||
name: true,
|
||||
color: true,
|
||||
},
|
||||
},
|
||||
title: 'Severity',
|
||||
type: FieldType.Entity,
|
||||
getElement: (item: AlertEpisode): ReactElement => {
|
||||
if (!item.alertSeverity) {
|
||||
return <></>;
|
||||
}
|
||||
return (
|
||||
<Pill
|
||||
text={item.alertSeverity.name || ''}
|
||||
color={item.alertSeverity.color || Black}
|
||||
/>
|
||||
);
|
||||
},
|
||||
},
|
||||
{
|
||||
field: {
|
||||
alertCount: true,
|
||||
},
|
||||
title: 'Alerts',
|
||||
type: FieldType.Number,
|
||||
},
|
||||
{
|
||||
field: {
|
||||
lastActivityAt: true,
|
||||
},
|
||||
title: 'Last Activity',
|
||||
type: FieldType.DateTime,
|
||||
},
|
||||
]}
|
||||
filters={[
|
||||
{
|
||||
field: {
|
||||
currentAlertState: {
|
||||
_id: true,
|
||||
},
|
||||
},
|
||||
title: 'State',
|
||||
type: FieldType.Entity,
|
||||
filterEntityType: AlertState,
|
||||
filterQuery: {
|
||||
projectId: DashboardNavigation.getProjectId()!,
|
||||
},
|
||||
},
|
||||
{
|
||||
field: {
|
||||
alertSeverity: {
|
||||
_id: true,
|
||||
},
|
||||
},
|
||||
title: 'Severity',
|
||||
type: FieldType.Entity,
|
||||
filterEntityType: AlertSeverity,
|
||||
filterQuery: {
|
||||
projectId: DashboardNavigation.getProjectId()!,
|
||||
},
|
||||
},
|
||||
]}
|
||||
onViewPage={(item: AlertEpisode): void => {
|
||||
Navigation.navigate(
|
||||
DashboardNavigation.getAlertEpisodeViewRoute(item._id!)
|
||||
);
|
||||
}}
|
||||
/>
|
||||
);
|
||||
};
|
||||
|
||||
export default EpisodesPage;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 2. Episode Detail Page
|
||||
|
||||
**File Location:** `/Dashboard/src/Pages/Alerts/EpisodeView/Index.tsx`
|
||||
|
||||
**Route:** `/dashboard/:projectId/alerts/episodes/:episodeId`
|
||||
|
||||
**Wireframe:**
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────────────┐
|
||||
│ ← Episodes EP-42: Database Connectivity Issues │
|
||||
├─────────────────────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌──────────────────────────────────────────────┐ ┌──────────────────────────────┐ │
|
||||
│ │ Status │ 🔴 Active │ │ Actions │ │
|
||||
│ │ Severity │ Critical │ │ ┌────────────────────────┐ │ │
|
||||
│ │ Started │ Jan 20, 2026 10:45 AM │ │ │ [Acknowledge] │ │ │
|
||||
│ │ Last Activity │ 2 min ago │ │ │ [Resolve] │ │ │
|
||||
│ │ Alert Count │ 15 │ │ │ [Add Alert] │ │ │
|
||||
│ │ Monitors │ 3 │ │ │ [Merge Episodes] │ │ │
|
||||
│ └──────────────────────────────────────────────┘ └──────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ Tabs: [Overview] [Alerts (15)] [Timeline] [Settings] │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ OVERVIEW TAB: │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ Description [Edit] │ │
|
||||
│ │ Multiple database connection failures affecting production services │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ Assigned To [Edit] │ │
|
||||
│ │ 👤 John Smith (DBA Team) │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ Root Cause Analysis [Edit] │ │
|
||||
│ │ Database connection pool exhausted due to connection leak in payment service │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ Grouping Rule │ │
|
||||
│ │ "Database alerts - 5min" (Time Window: 5 minutes) │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**Sub-pages:**
|
||||
|
||||
| Route | Component | Description |
|
||||
|-------|-----------|-------------|
|
||||
| `/episodes/:id` | Overview | Episode details, owners, root cause |
|
||||
| `/episodes/:id/alerts` | Alerts | List of alerts in episode |
|
||||
| `/episodes/:id/timeline` | Timeline | Episode activity timeline |
|
||||
| `/episodes/:id/settings` | Settings | Delete episode |
|
||||
|
||||
---
|
||||
|
||||
### 3. Episode Alerts Tab
|
||||
|
||||
**File Location:** `/Dashboard/src/Pages/Alerts/EpisodeView/Alerts.tsx`
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────────────┐
|
||||
│ ALERTS TAB: [+ Add Alert] │
|
||||
├─────────────────────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌───────┬──────────────────────────────────────────────┬──────────┬───────┬──────┐ │
|
||||
│ │ ID │ Title │ Monitor │ State │ ··· │ │
|
||||
│ ├───────┼──────────────────────────────────────────────┼──────────┼───────┼──────┤ │
|
||||
│ │ #127 │ MySQL connection pool exhausted │ mysql-01 │ ● Act │ [x] │ │
|
||||
│ │ #126 │ MySQL connection timeout │ web-02 │ ● Act │ [x] │ │
|
||||
│ │ #125 │ PostgreSQL connection refused │ api-01 │ ✓ Res │ [x] │ │
|
||||
│ │ #124 │ MySQL connection timeout │ web-02 │ ● Act │ [x] │ │
|
||||
│ │ #123 │ MySQL connection timeout │ web-01 │ ● Act │ [x] │ │
|
||||
│ └───────┴──────────────────────────────────────────────┴──────────┴───────┴──────┘ │
|
||||
│ │
|
||||
│ Note: [x] = Remove from episode button │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 4. Grouping Rules Page
|
||||
|
||||
**File Location:** `/Dashboard/src/Pages/Settings/AlertGroupingRules.tsx`
|
||||
|
||||
**Route:** `/dashboard/:projectId/settings/alert-grouping-rules`
|
||||
|
||||
**Wireframe:**
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────────────┐
|
||||
│ Settings > Alert Grouping Rules [+ Create Rule] │
|
||||
├─────────────────────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ Grouping rules automatically combine related alerts into Episodes. │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ ✅ Database Alerts - 5 minute window Priority: 1 │ │
|
||||
│ │ ────────────────────────────────────────────────────────────────────────────── │ │
|
||||
│ │ Type: Time Window (5 minutes) │ │
|
||||
│ │ Matches: Monitors with label "database" │ │
|
||||
│ │ Episodes created: 23 │ Alerts grouped: 156 │ │
|
||||
│ │ [Edit] [Delete]│ │
|
||||
│ └─────────────────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ ❌ Smart Grouping (Disabled) Priority: 2 │ │
|
||||
│ │ ────────────────────────────────────────────────────────────────────────────── │ │
|
||||
│ │ Type: Smart (80% similarity) │ │
|
||||
│ │ Matches: All critical alerts │ │
|
||||
│ │ [Enable] [Edit] [Delete]│ │
|
||||
│ └─────────────────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 5. Create/Edit Grouping Rule Form
|
||||
|
||||
**File Location:** `/Dashboard/src/Pages/Settings/AlertGroupingRuleView/Index.tsx`
|
||||
|
||||
**Wireframe:**
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────────────┐
|
||||
│ Create Grouping Rule │
|
||||
├─────────────────────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ BASIC INFORMATION │
|
||||
│ ───────────────────────────────────────────────────────────────────────────────── │
|
||||
│ │
|
||||
│ Rule Name * │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ Database Alerts - 5 minute window │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ Description │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ Groups database-related alerts within 5 minutes │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ Priority (lower = evaluated first) │
|
||||
│ ┌──────────┐ │
|
||||
│ │ 1 │ │
|
||||
│ └──────────┘ │
|
||||
│ │
|
||||
│ MATCHING CRITERIA │
|
||||
│ ───────────────────────────────────────────────────────────────────────────────── │
|
||||
│ Which alerts should this rule apply to? │
|
||||
│ │
|
||||
│ Severities (optional) │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ [Critical ×] [High ×] [+ Add] │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ Labels (optional) │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ [database ×] [+ Add] │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ Title Pattern (regex, optional) │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ .*(connection|database|mysql|postgres).* │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ GROUPING METHOD │
|
||||
│ ───────────────────────────────────────────────────────────────────────────────── │
|
||||
│ │
|
||||
│ Grouping Type * │
|
||||
│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │
|
||||
│ │ ● Time Window │ │ ○ Field-Based │ │ ○ Smart (Beta) │ │
|
||||
│ └──────────────────┘ └──────────────────┘ └──────────────────┘ │
|
||||
│ │
|
||||
│ Time Window (minutes) * │
|
||||
│ ┌──────────┐ │
|
||||
│ │ 5 │ Alerts arriving within this window will be grouped together. │
|
||||
│ └──────────┘ │
|
||||
│ │
|
||||
│ EPISODE SETTINGS │
|
||||
│ ───────────────────────────────────────────────────────────────────────────────── │
|
||||
│ │
|
||||
│ Episode Title Template │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ {{severity}} - Database Issues │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────────────────┘ │
|
||||
│ Available: {{severity}}, {{monitor}}, {{alertCount}} │
|
||||
│ │
|
||||
│ ☑ Auto-resolve episode when all alerts are resolved │
|
||||
│ │
|
||||
│ Break episode after inactive for (minutes) │
|
||||
│ ┌──────────┐ │
|
||||
│ │ 60 │ │
|
||||
│ └──────────┘ │
|
||||
│ │
|
||||
│ [Cancel] [Test Rule] [Save] │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Existing Page Modifications
|
||||
|
||||
### 1. Alerts Table Enhancement
|
||||
|
||||
Add Episode column to the existing Alerts table.
|
||||
|
||||
**File:** `/Dashboard/src/Pages/Alerts/View/Index.tsx`
|
||||
|
||||
```typescript
|
||||
// Add to columns array:
|
||||
{
|
||||
field: {
|
||||
episode: {
|
||||
_id: true,
|
||||
episodeNumber: true,
|
||||
title: true,
|
||||
},
|
||||
},
|
||||
title: 'Episode',
|
||||
type: FieldType.Entity,
|
||||
getElement: (item: Alert): ReactElement => {
|
||||
if (!item.episode) {
|
||||
return <span className="text-gray-400">—</span>;
|
||||
}
|
||||
return (
|
||||
<Link
|
||||
to={DashboardNavigation.getAlertEpisodeViewRoute(
|
||||
item.episode._id!
|
||||
)}
|
||||
>
|
||||
EP-{item.episode.episodeNumber}
|
||||
</Link>
|
||||
);
|
||||
},
|
||||
},
|
||||
```
|
||||
|
||||
### 2. Alert Detail Page Enhancement
|
||||
|
||||
Show episode membership on alert detail page.
|
||||
|
||||
**File:** `/Dashboard/src/Pages/Alerts/AlertView/Index.tsx`
|
||||
|
||||
Add a card showing:
|
||||
- Episode badge (if part of episode)
|
||||
- Link to episode detail
|
||||
- Button to remove from episode
|
||||
|
||||
---
|
||||
|
||||
## Components to Create
|
||||
|
||||
### 1. EpisodeCard Component
|
||||
|
||||
**File:** `/Dashboard/src/Components/Episode/EpisodeCard.tsx`
|
||||
|
||||
Reusable card for displaying episode summary.
|
||||
|
||||
```typescript
|
||||
interface EpisodeCardProps {
|
||||
episode: AlertEpisode;
|
||||
showAlertPreview?: boolean;
|
||||
onAcknowledge?: () => void;
|
||||
onResolve?: () => void;
|
||||
}
|
||||
```
|
||||
|
||||
### 2. EpisodeBadge Component
|
||||
|
||||
**File:** `/Dashboard/src/Components/Episode/EpisodeBadge.tsx`
|
||||
|
||||
Small badge showing episode number and link.
|
||||
|
||||
```typescript
|
||||
interface EpisodeBadgeProps {
|
||||
episodeNumber: number;
|
||||
episodeId: ObjectID;
|
||||
}
|
||||
```
|
||||
|
||||
### 3. AddAlertToEpisodeModal Component
|
||||
|
||||
**File:** `/Dashboard/src/Components/Episode/AddAlertToEpisodeModal.tsx`
|
||||
|
||||
Modal for manually adding alerts to an episode.
|
||||
|
||||
### 4. MergeEpisodesModal Component
|
||||
|
||||
**File:** `/Dashboard/src/Components/Episode/MergeEpisodesModal.tsx`
|
||||
|
||||
Modal for merging multiple episodes.
|
||||
|
||||
### 5. GroupingRuleForm Component
|
||||
|
||||
**File:** `/Dashboard/src/Components/GroupingRule/GroupingRuleForm.tsx`
|
||||
|
||||
Form for creating/editing grouping rules with:
|
||||
- Match criteria builder
|
||||
- Grouping type selector
|
||||
- Episode config options
|
||||
|
||||
---
|
||||
|
||||
## Routing Configuration
|
||||
|
||||
Add to `/Dashboard/src/Routes/AlertRoutes.tsx`:
|
||||
|
||||
```typescript
|
||||
// Episode routes
|
||||
{
|
||||
path: '/dashboard/:projectId/alerts/episodes',
|
||||
component: EpisodesPage,
|
||||
},
|
||||
{
|
||||
path: '/dashboard/:projectId/alerts/episodes/:episodeId',
|
||||
component: EpisodeViewLayout,
|
||||
children: [
|
||||
{
|
||||
path: '',
|
||||
component: EpisodeOverview,
|
||||
},
|
||||
{
|
||||
path: 'alerts',
|
||||
component: EpisodeAlerts,
|
||||
},
|
||||
{
|
||||
path: 'timeline',
|
||||
component: EpisodeTimeline,
|
||||
},
|
||||
{
|
||||
path: 'settings',
|
||||
component: EpisodeSettings,
|
||||
},
|
||||
],
|
||||
},
|
||||
```
|
||||
|
||||
Add to `/Dashboard/src/Routes/SettingsRoutes.tsx`:
|
||||
|
||||
```typescript
|
||||
// Grouping rule routes
|
||||
{
|
||||
path: '/dashboard/:projectId/settings/alert-grouping-rules',
|
||||
component: AlertGroupingRulesPage,
|
||||
},
|
||||
{
|
||||
path: '/dashboard/:projectId/settings/alert-grouping-rules/:ruleId',
|
||||
component: AlertGroupingRuleViewLayout,
|
||||
},
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Navigation Helper Updates
|
||||
|
||||
Add to `/Dashboard/src/Utils/Navigation.ts`:
|
||||
|
||||
```typescript
|
||||
public static getAlertEpisodesRoute(projectId?: ObjectID): Route {
|
||||
return new Route(`/dashboard/${projectId?.toString()}/alerts/episodes`);
|
||||
}
|
||||
|
||||
public static getAlertEpisodeViewRoute(episodeId: ObjectID): Route {
|
||||
return new Route(
|
||||
`/dashboard/${this.getProjectId()?.toString()}/alerts/episodes/${episodeId.toString()}`
|
||||
);
|
||||
}
|
||||
|
||||
public static getAlertGroupingRulesRoute(): Route {
|
||||
return new Route(
|
||||
`/dashboard/${this.getProjectId()?.toString()}/settings/alert-grouping-rules`
|
||||
);
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Sidebar Menu Updates
|
||||
|
||||
Add to Alerts section in `/Dashboard/src/Components/Sidebar/Sidebar.tsx`:
|
||||
|
||||
```typescript
|
||||
{
|
||||
title: 'Episodes',
|
||||
route: RouteMap.AlertEpisodes,
|
||||
icon: IconProp.Layers,
|
||||
}
|
||||
```
|
||||
|
||||
Add to Settings > Alerts section:
|
||||
|
||||
```typescript
|
||||
{
|
||||
title: 'Grouping Rules',
|
||||
route: RouteMap.AlertGroupingRules,
|
||||
icon: IconProp.Layers,
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Implementation Checklist
|
||||
|
||||
### Pages
|
||||
- [ ] Episodes list page
|
||||
- [ ] Episode detail page (overview)
|
||||
- [ ] Episode alerts tab
|
||||
- [ ] Episode timeline tab
|
||||
- [ ] Episode settings tab
|
||||
- [ ] Grouping rules list page
|
||||
- [ ] Grouping rule detail/edit page
|
||||
|
||||
### Components
|
||||
- [ ] EpisodeCard component
|
||||
- [ ] EpisodeBadge component
|
||||
- [ ] AddAlertToEpisodeModal
|
||||
- [ ] MergeEpisodesModal
|
||||
- [ ] GroupingRuleForm
|
||||
- [ ] GroupingTypeSelector
|
||||
|
||||
### Existing Page Updates
|
||||
- [ ] Add Episode column to Alerts table
|
||||
- [ ] Add Episode card to Alert detail page
|
||||
- [ ] Add sidebar navigation items
|
||||
- [ ] Update route configuration
|
||||
|
||||
### Styling
|
||||
- [ ] Episode card styles
|
||||
- [ ] Episode badge styles
|
||||
- [ ] Grouping rule form styles
|
||||
- [ ] Timeline component styles
|
||||
@@ -1,888 +0,0 @@
|
||||
# Migration & Rollout Plan for Alert Grouping
|
||||
|
||||
## Overview
|
||||
|
||||
This document outlines the database migrations, feature flags, and rollout strategy for Alert Grouping / Episodes functionality.
|
||||
|
||||
## Database Migrations
|
||||
|
||||
### Migration 1: Create AlertGroupingRule Table
|
||||
|
||||
**File:** `/Common/Server/Infrastructure/Postgres/SchemaMigrations/XXXX-CreateAlertGroupingRule.ts`
|
||||
|
||||
```typescript
|
||||
import { MigrationInterface, QueryRunner, Table, TableIndex } from 'typeorm';
|
||||
|
||||
export class CreateAlertGroupingRule implements MigrationInterface {
|
||||
public async up(queryRunner: QueryRunner): Promise<void> {
|
||||
await queryRunner.createTable(
|
||||
new Table({
|
||||
name: 'AlertGroupingRule',
|
||||
columns: [
|
||||
{
|
||||
name: '_id',
|
||||
type: 'uuid',
|
||||
isPrimary: true,
|
||||
default: 'uuid_generate_v4()',
|
||||
},
|
||||
{
|
||||
name: 'projectId',
|
||||
type: 'uuid',
|
||||
isNullable: false,
|
||||
},
|
||||
{
|
||||
name: 'name',
|
||||
type: 'varchar',
|
||||
length: '500',
|
||||
isNullable: false,
|
||||
},
|
||||
{
|
||||
name: 'description',
|
||||
type: 'text',
|
||||
isNullable: true,
|
||||
},
|
||||
{
|
||||
name: 'isEnabled',
|
||||
type: 'boolean',
|
||||
default: true,
|
||||
},
|
||||
{
|
||||
name: 'matchCriteria',
|
||||
type: 'jsonb',
|
||||
isNullable: true,
|
||||
},
|
||||
{
|
||||
name: 'groupingConfig',
|
||||
type: 'jsonb',
|
||||
isNullable: false,
|
||||
},
|
||||
{
|
||||
name: 'episodeConfig',
|
||||
type: 'jsonb',
|
||||
isNullable: false,
|
||||
},
|
||||
{
|
||||
name: 'priority',
|
||||
type: 'integer',
|
||||
default: 100,
|
||||
},
|
||||
{
|
||||
name: 'createdAt',
|
||||
type: 'timestamp',
|
||||
default: 'CURRENT_TIMESTAMP',
|
||||
},
|
||||
{
|
||||
name: 'updatedAt',
|
||||
type: 'timestamp',
|
||||
default: 'CURRENT_TIMESTAMP',
|
||||
},
|
||||
{
|
||||
name: 'deletedAt',
|
||||
type: 'timestamp',
|
||||
isNullable: true,
|
||||
},
|
||||
],
|
||||
}),
|
||||
true
|
||||
);
|
||||
|
||||
await queryRunner.createIndex(
|
||||
'AlertGroupingRule',
|
||||
new TableIndex({
|
||||
name: 'idx_grouping_rule_project_enabled',
|
||||
columnNames: ['projectId', 'isEnabled', 'priority'],
|
||||
})
|
||||
);
|
||||
}
|
||||
|
||||
public async down(queryRunner: QueryRunner): Promise<void> {
|
||||
await queryRunner.dropTable('AlertGroupingRule');
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Migration 2: Create AlertEpisode Table
|
||||
|
||||
**File:** `/Common/Server/Infrastructure/Postgres/SchemaMigrations/XXXX-CreateAlertEpisode.ts`
|
||||
|
||||
```typescript
|
||||
import { MigrationInterface, QueryRunner, Table, TableIndex, TableForeignKey } from 'typeorm';
|
||||
|
||||
export class CreateAlertEpisode implements MigrationInterface {
|
||||
public async up(queryRunner: QueryRunner): Promise<void> {
|
||||
await queryRunner.createTable(
|
||||
new Table({
|
||||
name: 'AlertEpisode',
|
||||
columns: [
|
||||
{
|
||||
name: '_id',
|
||||
type: 'uuid',
|
||||
isPrimary: true,
|
||||
default: 'uuid_generate_v4()',
|
||||
},
|
||||
{
|
||||
name: 'projectId',
|
||||
type: 'uuid',
|
||||
isNullable: false,
|
||||
},
|
||||
{
|
||||
name: 'episodeNumber',
|
||||
type: 'integer',
|
||||
isNullable: false,
|
||||
},
|
||||
{
|
||||
name: 'title',
|
||||
type: 'varchar',
|
||||
length: '500',
|
||||
isNullable: false,
|
||||
},
|
||||
{
|
||||
name: 'description',
|
||||
type: 'text',
|
||||
isNullable: true,
|
||||
},
|
||||
{
|
||||
name: 'groupingRuleId',
|
||||
type: 'uuid',
|
||||
isNullable: true,
|
||||
},
|
||||
{
|
||||
name: 'currentAlertStateId',
|
||||
type: 'uuid',
|
||||
isNullable: true,
|
||||
},
|
||||
{
|
||||
name: 'alertSeverityId',
|
||||
type: 'uuid',
|
||||
isNullable: true,
|
||||
},
|
||||
{
|
||||
name: 'startedAt',
|
||||
type: 'timestamp',
|
||||
isNullable: false,
|
||||
},
|
||||
{
|
||||
name: 'lastActivityAt',
|
||||
type: 'timestamp',
|
||||
isNullable: false,
|
||||
},
|
||||
{
|
||||
name: 'acknowledgedAt',
|
||||
type: 'timestamp',
|
||||
isNullable: true,
|
||||
},
|
||||
{
|
||||
name: 'resolvedAt',
|
||||
type: 'timestamp',
|
||||
isNullable: true,
|
||||
},
|
||||
{
|
||||
name: 'alertCount',
|
||||
type: 'integer',
|
||||
default: 0,
|
||||
},
|
||||
{
|
||||
name: 'uniqueMonitorCount',
|
||||
type: 'integer',
|
||||
default: 0,
|
||||
},
|
||||
{
|
||||
name: 'rootCause',
|
||||
type: 'text',
|
||||
isNullable: true,
|
||||
},
|
||||
{
|
||||
name: 'createdAt',
|
||||
type: 'timestamp',
|
||||
default: 'CURRENT_TIMESTAMP',
|
||||
},
|
||||
{
|
||||
name: 'updatedAt',
|
||||
type: 'timestamp',
|
||||
default: 'CURRENT_TIMESTAMP',
|
||||
},
|
||||
{
|
||||
name: 'deletedAt',
|
||||
type: 'timestamp',
|
||||
isNullable: true,
|
||||
},
|
||||
],
|
||||
}),
|
||||
true
|
||||
);
|
||||
|
||||
// Indexes
|
||||
await queryRunner.createIndex(
|
||||
'AlertEpisode',
|
||||
new TableIndex({
|
||||
name: 'idx_episode_project_state',
|
||||
columnNames: ['projectId', 'currentAlertStateId', 'lastActivityAt'],
|
||||
})
|
||||
);
|
||||
|
||||
await queryRunner.createIndex(
|
||||
'AlertEpisode',
|
||||
new TableIndex({
|
||||
name: 'idx_episode_grouping_rule',
|
||||
columnNames: ['projectId', 'groupingRuleId', 'currentAlertStateId'],
|
||||
})
|
||||
);
|
||||
|
||||
// Foreign keys
|
||||
await queryRunner.createForeignKey(
|
||||
'AlertEpisode',
|
||||
new TableForeignKey({
|
||||
columnNames: ['projectId'],
|
||||
referencedTableName: 'Project',
|
||||
referencedColumnNames: ['_id'],
|
||||
onDelete: 'CASCADE',
|
||||
})
|
||||
);
|
||||
|
||||
await queryRunner.createForeignKey(
|
||||
'AlertEpisode',
|
||||
new TableForeignKey({
|
||||
columnNames: ['groupingRuleId'],
|
||||
referencedTableName: 'AlertGroupingRule',
|
||||
referencedColumnNames: ['_id'],
|
||||
onDelete: 'SET NULL',
|
||||
})
|
||||
);
|
||||
|
||||
await queryRunner.createForeignKey(
|
||||
'AlertEpisode',
|
||||
new TableForeignKey({
|
||||
columnNames: ['currentAlertStateId'],
|
||||
referencedTableName: 'AlertState',
|
||||
referencedColumnNames: ['_id'],
|
||||
onDelete: 'SET NULL',
|
||||
})
|
||||
);
|
||||
|
||||
await queryRunner.createForeignKey(
|
||||
'AlertEpisode',
|
||||
new TableForeignKey({
|
||||
columnNames: ['alertSeverityId'],
|
||||
referencedTableName: 'AlertSeverity',
|
||||
referencedColumnNames: ['_id'],
|
||||
onDelete: 'SET NULL',
|
||||
})
|
||||
);
|
||||
}
|
||||
|
||||
public async down(queryRunner: QueryRunner): Promise<void> {
|
||||
await queryRunner.dropTable('AlertEpisode');
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Migration 3: Create AlertEpisodeMember Table
|
||||
|
||||
**File:** `/Common/Server/Infrastructure/Postgres/SchemaMigrations/XXXX-CreateAlertEpisodeMember.ts`
|
||||
|
||||
```typescript
|
||||
import { MigrationInterface, QueryRunner, Table, TableIndex, TableForeignKey } from 'typeorm';
|
||||
|
||||
export class CreateAlertEpisodeMember implements MigrationInterface {
|
||||
public async up(queryRunner: QueryRunner): Promise<void> {
|
||||
await queryRunner.createTable(
|
||||
new Table({
|
||||
name: 'AlertEpisodeMember',
|
||||
columns: [
|
||||
{
|
||||
name: '_id',
|
||||
type: 'uuid',
|
||||
isPrimary: true,
|
||||
default: 'uuid_generate_v4()',
|
||||
},
|
||||
{
|
||||
name: 'projectId',
|
||||
type: 'uuid',
|
||||
isNullable: false,
|
||||
},
|
||||
{
|
||||
name: 'episodeId',
|
||||
type: 'uuid',
|
||||
isNullable: false,
|
||||
},
|
||||
{
|
||||
name: 'alertId',
|
||||
type: 'uuid',
|
||||
isNullable: false,
|
||||
},
|
||||
{
|
||||
name: 'addedBy',
|
||||
type: 'varchar',
|
||||
length: '50',
|
||||
isNullable: false,
|
||||
},
|
||||
{
|
||||
name: 'addedAt',
|
||||
type: 'timestamp',
|
||||
isNullable: false,
|
||||
},
|
||||
{
|
||||
name: 'groupingRuleId',
|
||||
type: 'uuid',
|
||||
isNullable: true,
|
||||
},
|
||||
{
|
||||
name: 'similarityScore',
|
||||
type: 'decimal',
|
||||
precision: 5,
|
||||
scale: 4,
|
||||
isNullable: true,
|
||||
},
|
||||
{
|
||||
name: 'createdAt',
|
||||
type: 'timestamp',
|
||||
default: 'CURRENT_TIMESTAMP',
|
||||
},
|
||||
{
|
||||
name: 'deletedAt',
|
||||
type: 'timestamp',
|
||||
isNullable: true,
|
||||
},
|
||||
],
|
||||
}),
|
||||
true
|
||||
);
|
||||
|
||||
// Indexes
|
||||
await queryRunner.createIndex(
|
||||
'AlertEpisodeMember',
|
||||
new TableIndex({
|
||||
name: 'idx_episode_member_episode',
|
||||
columnNames: ['episodeId', 'addedAt'],
|
||||
})
|
||||
);
|
||||
|
||||
await queryRunner.createIndex(
|
||||
'AlertEpisodeMember',
|
||||
new TableIndex({
|
||||
name: 'idx_episode_member_alert',
|
||||
columnNames: ['alertId'],
|
||||
})
|
||||
);
|
||||
|
||||
await queryRunner.createIndex(
|
||||
'AlertEpisodeMember',
|
||||
new TableIndex({
|
||||
name: 'idx_episode_member_unique',
|
||||
columnNames: ['episodeId', 'alertId'],
|
||||
isUnique: true,
|
||||
})
|
||||
);
|
||||
|
||||
// Foreign keys
|
||||
await queryRunner.createForeignKey(
|
||||
'AlertEpisodeMember',
|
||||
new TableForeignKey({
|
||||
columnNames: ['projectId'],
|
||||
referencedTableName: 'Project',
|
||||
referencedColumnNames: ['_id'],
|
||||
onDelete: 'CASCADE',
|
||||
})
|
||||
);
|
||||
|
||||
await queryRunner.createForeignKey(
|
||||
'AlertEpisodeMember',
|
||||
new TableForeignKey({
|
||||
columnNames: ['episodeId'],
|
||||
referencedTableName: 'AlertEpisode',
|
||||
referencedColumnNames: ['_id'],
|
||||
onDelete: 'CASCADE',
|
||||
})
|
||||
);
|
||||
|
||||
await queryRunner.createForeignKey(
|
||||
'AlertEpisodeMember',
|
||||
new TableForeignKey({
|
||||
columnNames: ['alertId'],
|
||||
referencedTableName: 'Alert',
|
||||
referencedColumnNames: ['_id'],
|
||||
onDelete: 'CASCADE',
|
||||
})
|
||||
);
|
||||
}
|
||||
|
||||
public async down(queryRunner: QueryRunner): Promise<void> {
|
||||
await queryRunner.dropTable('AlertEpisodeMember');
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Migration 4: Add Episode Fields to Alert Table
|
||||
|
||||
**File:** `/Common/Server/Infrastructure/Postgres/SchemaMigrations/XXXX-AddEpisodeFieldsToAlert.ts`
|
||||
|
||||
```typescript
|
||||
import { MigrationInterface, QueryRunner, TableColumn, TableIndex, TableForeignKey } from 'typeorm';
|
||||
|
||||
export class AddEpisodeFieldsToAlert implements MigrationInterface {
|
||||
public async up(queryRunner: QueryRunner): Promise<void> {
|
||||
// Add episodeId column
|
||||
await queryRunner.addColumn(
|
||||
'Alert',
|
||||
new TableColumn({
|
||||
name: 'episodeId',
|
||||
type: 'uuid',
|
||||
isNullable: true,
|
||||
})
|
||||
);
|
||||
|
||||
// Add fingerprint column
|
||||
await queryRunner.addColumn(
|
||||
'Alert',
|
||||
new TableColumn({
|
||||
name: 'fingerprint',
|
||||
type: 'varchar',
|
||||
length: '64',
|
||||
isNullable: true,
|
||||
})
|
||||
);
|
||||
|
||||
// Add duplicateCount column
|
||||
await queryRunner.addColumn(
|
||||
'Alert',
|
||||
new TableColumn({
|
||||
name: 'duplicateCount',
|
||||
type: 'integer',
|
||||
default: 0,
|
||||
})
|
||||
);
|
||||
|
||||
// Add lastDuplicateAt column
|
||||
await queryRunner.addColumn(
|
||||
'Alert',
|
||||
new TableColumn({
|
||||
name: 'lastDuplicateAt',
|
||||
type: 'timestamp',
|
||||
isNullable: true,
|
||||
})
|
||||
);
|
||||
|
||||
// Create indexes
|
||||
await queryRunner.createIndex(
|
||||
'Alert',
|
||||
new TableIndex({
|
||||
name: 'idx_alert_episode',
|
||||
columnNames: ['episodeId'],
|
||||
where: '"episodeId" IS NOT NULL',
|
||||
})
|
||||
);
|
||||
|
||||
await queryRunner.createIndex(
|
||||
'Alert',
|
||||
new TableIndex({
|
||||
name: 'idx_alert_fingerprint',
|
||||
columnNames: ['projectId', 'fingerprint'],
|
||||
where: '"fingerprint" IS NOT NULL',
|
||||
})
|
||||
);
|
||||
|
||||
// Create foreign key
|
||||
await queryRunner.createForeignKey(
|
||||
'Alert',
|
||||
new TableForeignKey({
|
||||
name: 'fk_alert_episode',
|
||||
columnNames: ['episodeId'],
|
||||
referencedTableName: 'AlertEpisode',
|
||||
referencedColumnNames: ['_id'],
|
||||
onDelete: 'SET NULL',
|
||||
})
|
||||
);
|
||||
}
|
||||
|
||||
public async down(queryRunner: QueryRunner): Promise<void> {
|
||||
await queryRunner.dropForeignKey('Alert', 'fk_alert_episode');
|
||||
await queryRunner.dropIndex('Alert', 'idx_alert_fingerprint');
|
||||
await queryRunner.dropIndex('Alert', 'idx_alert_episode');
|
||||
await queryRunner.dropColumn('Alert', 'lastDuplicateAt');
|
||||
await queryRunner.dropColumn('Alert', 'duplicateCount');
|
||||
await queryRunner.dropColumn('Alert', 'fingerprint');
|
||||
await queryRunner.dropColumn('Alert', 'episodeId');
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Migration 5: Create Episode Join Tables
|
||||
|
||||
**File:** `/Common/Server/Infrastructure/Postgres/SchemaMigrations/XXXX-CreateEpisodeJoinTables.ts`
|
||||
|
||||
```typescript
|
||||
import { MigrationInterface, QueryRunner, Table, TableForeignKey } from 'typeorm';
|
||||
|
||||
export class CreateEpisodeJoinTables implements MigrationInterface {
|
||||
public async up(queryRunner: QueryRunner): Promise<void> {
|
||||
// AlertEpisodeOwnerUser join table
|
||||
await queryRunner.createTable(
|
||||
new Table({
|
||||
name: 'AlertEpisodeOwnerUser',
|
||||
columns: [
|
||||
{
|
||||
name: 'episodeId',
|
||||
type: 'uuid',
|
||||
isPrimary: true,
|
||||
},
|
||||
{
|
||||
name: 'userId',
|
||||
type: 'uuid',
|
||||
isPrimary: true,
|
||||
},
|
||||
],
|
||||
}),
|
||||
true
|
||||
);
|
||||
|
||||
await queryRunner.createForeignKey(
|
||||
'AlertEpisodeOwnerUser',
|
||||
new TableForeignKey({
|
||||
columnNames: ['episodeId'],
|
||||
referencedTableName: 'AlertEpisode',
|
||||
referencedColumnNames: ['_id'],
|
||||
onDelete: 'CASCADE',
|
||||
})
|
||||
);
|
||||
|
||||
await queryRunner.createForeignKey(
|
||||
'AlertEpisodeOwnerUser',
|
||||
new TableForeignKey({
|
||||
columnNames: ['userId'],
|
||||
referencedTableName: 'User',
|
||||
referencedColumnNames: ['_id'],
|
||||
onDelete: 'CASCADE',
|
||||
})
|
||||
);
|
||||
|
||||
// AlertEpisodeOwnerTeam join table
|
||||
await queryRunner.createTable(
|
||||
new Table({
|
||||
name: 'AlertEpisodeOwnerTeam',
|
||||
columns: [
|
||||
{
|
||||
name: 'episodeId',
|
||||
type: 'uuid',
|
||||
isPrimary: true,
|
||||
},
|
||||
{
|
||||
name: 'teamId',
|
||||
type: 'uuid',
|
||||
isPrimary: true,
|
||||
},
|
||||
],
|
||||
}),
|
||||
true
|
||||
);
|
||||
|
||||
await queryRunner.createForeignKey(
|
||||
'AlertEpisodeOwnerTeam',
|
||||
new TableForeignKey({
|
||||
columnNames: ['episodeId'],
|
||||
referencedTableName: 'AlertEpisode',
|
||||
referencedColumnNames: ['_id'],
|
||||
onDelete: 'CASCADE',
|
||||
})
|
||||
);
|
||||
|
||||
await queryRunner.createForeignKey(
|
||||
'AlertEpisodeOwnerTeam',
|
||||
new TableForeignKey({
|
||||
columnNames: ['teamId'],
|
||||
referencedTableName: 'Team',
|
||||
referencedColumnNames: ['_id'],
|
||||
onDelete: 'CASCADE',
|
||||
})
|
||||
);
|
||||
|
||||
// AlertEpisodeLabel join table
|
||||
await queryRunner.createTable(
|
||||
new Table({
|
||||
name: 'AlertEpisodeLabel',
|
||||
columns: [
|
||||
{
|
||||
name: 'episodeId',
|
||||
type: 'uuid',
|
||||
isPrimary: true,
|
||||
},
|
||||
{
|
||||
name: 'labelId',
|
||||
type: 'uuid',
|
||||
isPrimary: true,
|
||||
},
|
||||
],
|
||||
}),
|
||||
true
|
||||
);
|
||||
|
||||
await queryRunner.createForeignKey(
|
||||
'AlertEpisodeLabel',
|
||||
new TableForeignKey({
|
||||
columnNames: ['episodeId'],
|
||||
referencedTableName: 'AlertEpisode',
|
||||
referencedColumnNames: ['_id'],
|
||||
onDelete: 'CASCADE',
|
||||
})
|
||||
);
|
||||
|
||||
await queryRunner.createForeignKey(
|
||||
'AlertEpisodeLabel',
|
||||
new TableForeignKey({
|
||||
columnNames: ['labelId'],
|
||||
referencedTableName: 'Label',
|
||||
referencedColumnNames: ['_id'],
|
||||
onDelete: 'CASCADE',
|
||||
})
|
||||
);
|
||||
}
|
||||
|
||||
public async down(queryRunner: QueryRunner): Promise<void> {
|
||||
await queryRunner.dropTable('AlertEpisodeLabel');
|
||||
await queryRunner.dropTable('AlertEpisodeOwnerTeam');
|
||||
await queryRunner.dropTable('AlertEpisodeOwnerUser');
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Feature Flags
|
||||
|
||||
### Project-Level Settings
|
||||
|
||||
Add to Project model or create AlertGroupingSettings:
|
||||
|
||||
```typescript
|
||||
interface AlertGroupingSettings {
|
||||
// Master switch
|
||||
groupingEnabled: boolean;
|
||||
|
||||
// Auto-create episodes for new alerts
|
||||
autoCreateEpisodes: boolean;
|
||||
|
||||
// Default time window for grouping (minutes)
|
||||
defaultTimeWindowMinutes: number;
|
||||
}
|
||||
```
|
||||
|
||||
### Implementation
|
||||
|
||||
```typescript
|
||||
// /Common/Server/Services/AlertGroupingSettingsService.ts
|
||||
|
||||
export default class AlertGroupingSettingsService {
|
||||
public static async isGroupingEnabled(projectId: ObjectID): Promise<boolean> {
|
||||
const settings = await ProjectService.findOneById({
|
||||
id: projectId,
|
||||
select: { alertGroupingEnabled: true },
|
||||
props: { isRoot: true },
|
||||
});
|
||||
|
||||
return settings?.alertGroupingEnabled ?? false;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Usage in GroupingEngine
|
||||
|
||||
```typescript
|
||||
// In GroupingEngine.processAlert():
|
||||
const isEnabled = await AlertGroupingSettingsService.isGroupingEnabled(projectId);
|
||||
if (!isEnabled) {
|
||||
return { shouldGroup: false, isNewEpisode: false };
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Rollout Strategy
|
||||
|
||||
### Phase 1: Internal Alpha
|
||||
|
||||
**Duration:** 1 week
|
||||
|
||||
**Scope:**
|
||||
- Enable for internal test projects only
|
||||
- Feature flag: `ALERT_GROUPING_INTERNAL_ONLY=true`
|
||||
|
||||
**Validation:**
|
||||
- Verify migrations run successfully
|
||||
- Test basic grouping flow
|
||||
- Check performance metrics
|
||||
|
||||
---
|
||||
|
||||
### Phase 2: Beta (Opt-in)
|
||||
|
||||
**Duration:** 2 weeks
|
||||
|
||||
**Scope:**
|
||||
- Available to all projects but disabled by default
|
||||
- Users must explicitly enable in Settings
|
||||
- Show "Beta" badge on Episodes page
|
||||
|
||||
**Communication:**
|
||||
- In-app announcement
|
||||
- Documentation published
|
||||
- Support team briefed
|
||||
|
||||
**Monitoring:**
|
||||
- Episode creation rate
|
||||
- Grouping accuracy feedback
|
||||
- Performance metrics
|
||||
|
||||
---
|
||||
|
||||
### Phase 3: General Availability
|
||||
|
||||
**Duration:** Ongoing
|
||||
|
||||
**Scope:**
|
||||
- Enabled by default for new projects
|
||||
- Existing projects can opt-in via Settings
|
||||
|
||||
**Milestones:**
|
||||
- Remove "Beta" badge
|
||||
- Enable by default for all new projects
|
||||
- Provide migration tool for existing alerts
|
||||
|
||||
---
|
||||
|
||||
## Backward Compatibility
|
||||
|
||||
### No Breaking Changes
|
||||
|
||||
1. **Existing alerts unchanged** - episodeId is nullable, defaults to null
|
||||
2. **Existing API unchanged** - new fields added but not required
|
||||
3. **Opt-in only** - grouping disabled until rules created
|
||||
|
||||
### Gradual Adoption
|
||||
|
||||
1. Users create grouping rules when ready
|
||||
2. Only new alerts are grouped (after rule creation)
|
||||
3. No retroactive grouping unless explicitly triggered
|
||||
|
||||
---
|
||||
|
||||
## Data Migration (Optional)
|
||||
|
||||
### Retroactive Alert Grouping
|
||||
|
||||
For users who want to group existing alerts:
|
||||
|
||||
```typescript
|
||||
// /Worker/Jobs/AlertEpisode/RetroactiveGrouping.ts
|
||||
|
||||
export async function retroactivelyGroupAlerts(
|
||||
projectId: ObjectID,
|
||||
ruleId: ObjectID,
|
||||
startDate: Date,
|
||||
endDate: Date
|
||||
): Promise<void> {
|
||||
// Get rule
|
||||
const rule = await AlertGroupingRuleService.findOneById({ id: ruleId });
|
||||
|
||||
// Get alerts in date range
|
||||
const alerts = await AlertService.findBy({
|
||||
query: {
|
||||
projectId,
|
||||
createdAt: QueryHelper.between(startDate, endDate),
|
||||
episodeId: QueryHelper.isNull(),
|
||||
},
|
||||
select: { ... },
|
||||
props: { isRoot: true },
|
||||
});
|
||||
|
||||
// Group alerts
|
||||
for (const alert of alerts) {
|
||||
await GroupingEngine.processAlert(alert, projectId);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This would be triggered via Admin UI or API endpoint.
|
||||
|
||||
---
|
||||
|
||||
## Rollback Plan
|
||||
|
||||
### Database Rollback
|
||||
|
||||
If issues discovered, migrations can be rolled back:
|
||||
|
||||
```bash
|
||||
npm run migration:revert
|
||||
```
|
||||
|
||||
### Feature Flag Disable
|
||||
|
||||
Immediately disable grouping for all projects:
|
||||
|
||||
```bash
|
||||
# Set environment variable
|
||||
ALERT_GROUPING_GLOBAL_DISABLE=true
|
||||
```
|
||||
|
||||
### Data Preservation
|
||||
|
||||
- Episodes and members remain in database
|
||||
- Alerts keep episodeId reference
|
||||
- Can be re-enabled later without data loss
|
||||
|
||||
---
|
||||
|
||||
## Monitoring & Alerts
|
||||
|
||||
### Key Metrics
|
||||
|
||||
| Metric | Description | Threshold |
|
||||
|--------|-------------|-----------|
|
||||
| `episode_creation_rate` | Episodes created per hour | Monitor for anomalies |
|
||||
| `grouping_latency_p99` | Time to group an alert | < 50ms |
|
||||
| `episode_alert_ratio` | Avg alerts per episode | > 2 (effective grouping) |
|
||||
| `grouping_engine_errors` | Errors in grouping | 0 |
|
||||
|
||||
### Dashboards
|
||||
|
||||
Create monitoring dashboards for:
|
||||
- Episode creation over time
|
||||
- Grouping rule effectiveness
|
||||
- Performance metrics
|
||||
- Error rates
|
||||
|
||||
---
|
||||
|
||||
## Checklist
|
||||
|
||||
### Pre-Migration
|
||||
- [ ] Review migration scripts
|
||||
- [ ] Test migrations on staging
|
||||
- [ ] Backup production database
|
||||
- [ ] Prepare rollback procedure
|
||||
|
||||
### Migration
|
||||
- [ ] Run migrations in order
|
||||
- [ ] Verify table creation
|
||||
- [ ] Verify index creation
|
||||
- [ ] Verify foreign keys
|
||||
|
||||
### Post-Migration
|
||||
- [ ] Deploy updated API
|
||||
- [ ] Deploy updated Worker
|
||||
- [ ] Deploy updated Dashboard
|
||||
- [ ] Enable feature flags for alpha
|
||||
- [ ] Monitor metrics
|
||||
|
||||
### GA Release
|
||||
- [ ] Remove beta badges
|
||||
- [ ] Update documentation
|
||||
- [ ] Enable for new projects
|
||||
- [ ] Announce to users
|
||||
@@ -1,117 +0,0 @@
|
||||
# Alert Grouping / Episodes Implementation Plan
|
||||
|
||||
## Overview
|
||||
|
||||
This sub-plan details the implementation of Alert Grouping and Episodes functionality for OneUptime. This feature groups related alerts into logical containers called "Episodes" to reduce noise and help operators focus on root causes rather than individual symptoms.
|
||||
|
||||
## Documents
|
||||
|
||||
| Document | Description |
|
||||
|----------|-------------|
|
||||
| [1-DataModels.md](./1-DataModels.md) | Database models and schema definitions |
|
||||
| [2-Backend.md](./2-Backend.md) | Backend services and grouping engine |
|
||||
| [3-API.md](./3-API.md) | REST API endpoints |
|
||||
| [4-UI.md](./4-UI.md) | Frontend components and pages |
|
||||
| [5-Migration.md](./5-Migration.md) | Database migrations and rollout |
|
||||
|
||||
## Feature Summary
|
||||
|
||||
### What is an Episode?
|
||||
|
||||
An **Episode** is a container that groups related alerts together. Instead of seeing 50 individual "connection timeout" alerts, operators see one episode: "Database Connectivity Issues (50 alerts)".
|
||||
|
||||
### Key Capabilities
|
||||
|
||||
1. **Automatic Grouping** - Rules-based grouping of alerts into episodes
|
||||
2. **Time-Window Grouping** - Group alerts occurring within N minutes
|
||||
3. **Field-Based Grouping** - Group by monitor, severity, labels, etc.
|
||||
4. **Manual Management** - Merge, split, add/remove alerts from episodes
|
||||
5. **Episode Lifecycle** - Active → Acknowledged → Resolved states
|
||||
6. **Root Cause Tracking** - Document root cause analysis per episode
|
||||
|
||||
### User Stories
|
||||
|
||||
```
|
||||
As an operator, I want to see related alerts grouped together
|
||||
so that I can focus on root causes instead of individual symptoms.
|
||||
|
||||
As an operator, I want to acknowledge an entire episode at once
|
||||
so that I don't have to acknowledge each alert individually.
|
||||
|
||||
As a team lead, I want to configure grouping rules
|
||||
so that alerts are automatically organized by our team's workflow.
|
||||
|
||||
As an operator, I want to document the root cause of an episode
|
||||
so that the team can learn from past incidents.
|
||||
```
|
||||
|
||||
## Implementation Phases
|
||||
|
||||
### Phase 1: Core Models & Basic Grouping (Week 1-2)
|
||||
|
||||
- [ ] Create AlertEpisode model
|
||||
- [ ] Create AlertEpisodeMember model
|
||||
- [ ] Create AlertGroupingRule model
|
||||
- [ ] Implement basic time-window grouping engine
|
||||
- [ ] Integrate with alert creation flow
|
||||
|
||||
### Phase 2: Episode Management (Week 3)
|
||||
|
||||
- [ ] Episode state management (acknowledge, resolve)
|
||||
- [ ] Episode assignment (owners, teams)
|
||||
- [ ] Episode timeline tracking
|
||||
- [ ] Manual alert management (add/remove)
|
||||
|
||||
### Phase 3: UI - List & Detail Views (Week 4-5)
|
||||
|
||||
- [ ] Episodes list page
|
||||
- [ ] Episode detail page
|
||||
- [ ] Episode actions (acknowledge, resolve, assign)
|
||||
- [ ] Alert-to-episode linking in alerts table
|
||||
|
||||
### Phase 4: UI - Configuration (Week 6)
|
||||
|
||||
- [ ] Grouping rules list page
|
||||
- [ ] Create/edit grouping rule form
|
||||
- [ ] Rule testing functionality
|
||||
- [ ] Episode badge in alerts table
|
||||
|
||||
### Phase 5: Advanced Features (Week 7-8)
|
||||
|
||||
- [ ] Field-based grouping
|
||||
- [ ] Episode merge/split functionality
|
||||
- [ ] Episode notifications
|
||||
- [ ] Analytics and metrics
|
||||
|
||||
## Dependencies
|
||||
|
||||
### Existing Components Used
|
||||
|
||||
- `Alert` model and `AlertService`
|
||||
- `AlertState` and `AlertStateTimeline`
|
||||
- Dashboard routing and layout components
|
||||
- ModelTable and ModelForm components
|
||||
- On-call notification system
|
||||
|
||||
### New Components Created
|
||||
|
||||
- `AlertEpisode` model
|
||||
- `AlertEpisodeMember` model
|
||||
- `AlertGroupingRule` model
|
||||
- `GroupingEngine` service
|
||||
- Episode UI pages and components
|
||||
|
||||
## Success Metrics
|
||||
|
||||
| Metric | Target |
|
||||
|--------|--------|
|
||||
| Alert-to-episode ratio | 5:1 or higher |
|
||||
| Episode acknowledgment time | 50% faster than individual alerts |
|
||||
| User adoption | 80% of projects with grouping rules |
|
||||
| Processing latency | < 30ms added to alert creation |
|
||||
|
||||
## References
|
||||
|
||||
- [Parent Plan: AlertEngine.md](../AlertEngine.md)
|
||||
- [Splunk ITSI Episode Review](https://docs.splunk.com/Documentation/ITSI)
|
||||
- [PagerDuty Alert Grouping](https://support.pagerduty.com/docs/alert-grouping)
|
||||
61
Docs/Plan/AlertGrouping/Summary.md
Normal file
61
Docs/Plan/AlertGrouping/Summary.md
Normal file
@@ -0,0 +1,61 @@
|
||||
# Alert Grouping / Episodes - Summary
|
||||
|
||||
## What is Alert Grouping?
|
||||
|
||||
Alert Grouping is a feature that automatically combines related alerts into logical containers called **Episodes**. Instead of seeing 50 individual "connection timeout" alerts, operators see one episode: "Database Connectivity Issues (50 alerts)".
|
||||
|
||||
## Key Capabilities
|
||||
|
||||
1. **Automatic Grouping** - Rules-based grouping of alerts into episodes
|
||||
2. **Time-Window Grouping** - Group alerts occurring within N minutes
|
||||
3. **Field-Based Grouping** - Group by monitor, monitor custom fields, severity, labels, etc.
|
||||
4. **Manual Management** - Merge, split, add/remove alerts from episodes
|
||||
5. **Episode Lifecycle** - Active → Acknowledged → Resolved states. These should be linked to alert states.
|
||||
6. **Root Cause Tracking** - Document root cause analysis per episode. This is a placeholder field for user to fill out. We can even use Generate with AI to help summarize the episode based on Root Cause of all the alerts in the episode.
|
||||
7. **Flapping Prevention** - Grace periods before resolution and reopen windows
|
||||
|
||||
## Data Models
|
||||
|
||||
### Three New Models
|
||||
|
||||
| Model | Purpose |
|
||||
|-------|---------|
|
||||
| **AlertEpisode** | Container for grouped alerts (title, state, severity, timing, ownership) |
|
||||
| **AlertEpisodeMember** | Links alerts to episodes with metadata (addedBy, addedAt, similarityScore) |
|
||||
| **AlertGroupingRule** | Configures automatic grouping behavior (match criteria, grouping config, priority) |
|
||||
|
||||
### Alert Model Enhancements
|
||||
|
||||
- `episodeId` - Link to parent episode
|
||||
- `fingerprint` - Hash for deduplication
|
||||
- `duplicateCount` - Number of duplicates suppressed
|
||||
|
||||
## Grouping Types
|
||||
|
||||
| Type | Description |
|
||||
|------|-------------|
|
||||
| **Time Window** | Groups alerts within N minutes of each other |
|
||||
| **Field-Based** | Groups by matching fields (monitor, severity, labels) |
|
||||
| **Smart** | ML-based similarity matching (future) |
|
||||
|
||||
## Flapping Prevention
|
||||
|
||||
- **resolveDelayMinutes** - Grace period before auto-resolving (prevents rapid state changes)
|
||||
- **reopenWindowMinutes** - Window after resolution where episode can be reopened instead of creating new
|
||||
|
||||
## On-Call Policy Resolution
|
||||
|
||||
Priority chain for notifications:
|
||||
1. Grouping rule's is linked to on-call policy. When episode is created via a grouping rule, that rule's on-call policy is used.
|
||||
2. If alert has any on-call policy. Please use it as well along with grouping rule's on-call policy.
|
||||
3. If neither the grouping rule nor alert has an on-call policy, no notifications are sent.
|
||||
|
||||
When an alert joins an episode, the alert policy (if any) is executed as normal. The episode's on-call policy is also executed. This means that if an alert has an on-call policy, notifications may be sent twice - once for the alert and once for the episode. If the episode policy is executed and then a new alert joins the episode, the episode's on-call policy is NOT re-executed.
|
||||
|
||||
### Worker Jobs
|
||||
- **EpisodeAutoResolve** - Resolves episodes when all alerts resolved
|
||||
- **EpisodeBreakInactive** - Resolves episodes after inactivity period
|
||||
|
||||
## Database Migrations
|
||||
|
||||
Please do not write Database migrations. I will do that manually.
|
||||
Reference in New Issue
Block a user