feat(AlertGrouping): Remove outdated migration and implementation documents; add summary for Alert Grouping feature

- Deleted the detailed migration plan (5-Migration.md) and implementation plan (README.md) for Alert Grouping.
- Introduced a new summary document (Summary.md) outlining key capabilities, data models, grouping types, and on-call policy resolution for the Alert Grouping feature.
This commit is contained in:
Nawaz Dhandala
2026-01-20 18:32:31 +00:00
parent 8e8bc54aed
commit e699e323cb
7 changed files with 61 additions and 4835 deletions

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -1,606 +0,0 @@
# API Design for Alert Grouping
## Overview
This document defines the REST API endpoints for Alert Grouping / Episodes functionality.
## Base URLs
All endpoints are prefixed with the project scope:
```
/api/project/{projectId}/alert-episode
/api/project/{projectId}/alert-grouping-rule
```
---
## Episodes API
### List Episodes
```http
GET /api/project/{projectId}/alert-episode
```
**Query Parameters:**
| Parameter | Type | Description |
|-----------|------|-------------|
| `currentAlertStateId` | ObjectID | Filter by state |
| `alertSeverityId` | ObjectID | Filter by severity |
| `groupingRuleId` | ObjectID | Filter by grouping rule |
| `startedAt` | DateRange | Filter by start time |
| `search` | string | Search in title/description |
| `limit` | number | Results per page (default: 10) |
| `skip` | number | Pagination offset |
| `sort` | string | Sort field (default: `-lastActivityAt`) |
**Response:**
```json
{
"data": [
{
"_id": "episode-id-1",
"episodeNumber": 42,
"title": "Database Connectivity Issues",
"description": "Multiple database connection failures",
"currentAlertState": {
"_id": "state-id",
"name": "Active",
"color": "#FF0000"
},
"alertSeverity": {
"_id": "severity-id",
"name": "Critical",
"color": "#FF0000"
},
"alertCount": 15,
"uniqueMonitorCount": 3,
"startedAt": "2026-01-20T10:45:00Z",
"lastActivityAt": "2026-01-20T10:57:00Z",
"groupingRule": {
"_id": "rule-id",
"name": "Database alerts - 5min"
}
}
],
"count": 55,
"skip": 0,
"limit": 10
}
```
---
### Get Episode Details
```http
GET /api/project/{projectId}/alert-episode/{episodeId}
```
**Response:**
```json
{
"_id": "episode-id-1",
"episodeNumber": 42,
"title": "Database Connectivity Issues",
"description": "Multiple database connection failures",
"currentAlertState": {
"_id": "state-id",
"name": "Active",
"color": "#FF0000"
},
"alertSeverity": {
"_id": "severity-id",
"name": "Critical",
"color": "#FF0000"
},
"alertCount": 15,
"uniqueMonitorCount": 3,
"startedAt": "2026-01-20T10:45:00Z",
"lastActivityAt": "2026-01-20T10:57:00Z",
"acknowledgedAt": null,
"resolvedAt": null,
"groupingRule": {
"_id": "rule-id",
"name": "Database alerts - 5min"
},
"ownerUsers": [],
"ownerTeams": [],
"labels": [],
"rootCause": null
}
```
---
### Create Episode (Manual)
```http
POST /api/project/{projectId}/alert-episode
```
**Request Body:**
```json
{
"title": "Custom Episode Title",
"description": "Optional description"
}
```
**Response:** Created episode object
---
### Update Episode
```http
PUT /api/project/{projectId}/alert-episode/{episodeId}
```
**Request Body:**
```json
{
"title": "Updated Title",
"description": "Updated description",
"ownerUsers": ["user-id-1"],
"ownerTeams": ["team-id-1"],
"labels": ["label-id-1"],
"rootCause": "Database connection pool exhausted"
}
```
---
### Delete Episode
```http
DELETE /api/project/{projectId}/alert-episode/{episodeId}
```
Deleting an episode removes all member relationships but does NOT delete the alerts themselves. Alerts will have their `episodeId` set to null.
---
### Acknowledge Episode
```http
POST /api/project/{projectId}/alert-episode/{episodeId}/acknowledge
```
**Request Body:**
```json
{
"acknowledgeAlerts": true // Optional: also acknowledge all alerts
}
```
**Response:**
```json
{
"_id": "episode-id",
"currentAlertState": {
"_id": "acknowledged-state-id",
"name": "Acknowledged"
},
"acknowledgedAt": "2026-01-20T11:00:00Z"
}
```
---
### Resolve Episode
```http
POST /api/project/{projectId}/alert-episode/{episodeId}/resolve
```
**Request Body:**
```json
{
"rootCause": "Database server restarted",
"resolveAlerts": true // Optional: also resolve all alerts
}
```
---
### Get Episode Alerts
```http
GET /api/project/{projectId}/alert-episode/{episodeId}/alerts
```
**Query Parameters:**
| Parameter | Type | Description |
|-----------|------|-------------|
| `limit` | number | Results per page |
| `skip` | number | Pagination offset |
| `sort` | string | Sort field |
**Response:**
```json
{
"data": [
{
"_id": "alert-id-1",
"alertNumber": 127,
"title": "MySQL connection pool exhausted",
"currentAlertState": { ... },
"alertSeverity": { ... },
"monitor": { ... },
"createdAt": "2026-01-20T10:57:00Z",
"episodeMembership": {
"addedBy": "rule",
"addedAt": "2026-01-20T10:57:00Z",
"groupingRule": { "_id": "rule-id", "name": "Database alerts" }
}
}
],
"count": 15,
"skip": 0,
"limit": 10
}
```
---
### Add Alert to Episode
```http
POST /api/project/{projectId}/alert-episode/{episodeId}/add-alert
```
**Request Body:**
```json
{
"alertId": "alert-id-to-add"
}
```
---
### Remove Alert from Episode
```http
POST /api/project/{projectId}/alert-episode/{episodeId}/remove-alert
```
**Request Body:**
```json
{
"alertId": "alert-id-to-remove"
}
```
---
### Merge Episodes
```http
POST /api/project/{projectId}/alert-episode/merge
```
**Request Body:**
```json
{
"targetEpisodeId": "episode-to-keep",
"sourceEpisodeIds": ["episode-to-merge-1", "episode-to-merge-2"]
}
```
All alerts from source episodes are moved to the target episode. Source episodes are deleted.
---
### Split Episode
```http
POST /api/project/{projectId}/alert-episode/{episodeId}/split
```
**Request Body:**
```json
{
"alertIds": ["alert-id-1", "alert-id-2"],
"newEpisodeTitle": "Split Episode"
}
```
Creates a new episode with the specified alerts removed from the original episode.
---
### Get Episode Timeline
```http
GET /api/project/{projectId}/alert-episode/{episodeId}/timeline
```
**Response:**
```json
{
"data": [
{
"type": "alert_added",
"timestamp": "2026-01-20T10:57:00Z",
"description": "Alert #127 added to episode",
"alert": { "_id": "alert-id", "title": "MySQL connection pool exhausted" },
"addedBy": "rule"
},
{
"type": "state_change",
"timestamp": "2026-01-20T10:50:00Z",
"description": "Assigned to John Smith",
"user": { "_id": "user-id", "name": "John Smith" }
},
{
"type": "episode_created",
"timestamp": "2026-01-20T10:45:00Z",
"description": "Episode created with 3 initial alerts",
"groupingRule": { "_id": "rule-id", "name": "Database alerts - 5min" }
}
]
}
```
---
## Grouping Rules API
### List Grouping Rules
```http
GET /api/project/{projectId}/alert-grouping-rule
```
**Response:**
```json
{
"data": [
{
"_id": "rule-id-1",
"name": "Database Alerts - 5 minute window",
"description": "Groups database-related alerts within 5 minutes",
"isEnabled": true,
"priority": 1,
"matchCriteria": {
"labelIds": ["database-label-id"],
"titlePattern": ".*(connection|database|mysql|postgres).*"
},
"groupingConfig": {
"type": "time_window",
"timeWindowMinutes": 5
},
"episodeConfig": {
"titleTemplate": "{{severity}} - Database Issues",
"autoResolveWhenEmpty": true,
"breakAfterMinutesInactive": 60
}
}
],
"count": 3
}
```
---
### Get Grouping Rule
```http
GET /api/project/{projectId}/alert-grouping-rule/{ruleId}
```
---
### Create Grouping Rule
```http
POST /api/project/{projectId}/alert-grouping-rule
```
**Request Body:**
```json
{
"name": "Database Alerts - 5 minute window",
"description": "Groups database-related alerts within 5 minutes",
"isEnabled": true,
"priority": 1,
"matchCriteria": {
"severityIds": ["critical-id", "high-id"],
"labelIds": ["database-label-id"],
"titlePattern": ".*(connection|database).*"
},
"groupingConfig": {
"type": "time_window",
"timeWindowMinutes": 5
},
"episodeConfig": {
"titleTemplate": "{{severity}} - Database Issues",
"autoResolveWhenEmpty": true,
"breakAfterMinutesInactive": 60
}
}
```
---
### Update Grouping Rule
```http
PUT /api/project/{projectId}/alert-grouping-rule/{ruleId}
```
---
### Delete Grouping Rule
```http
DELETE /api/project/{projectId}/alert-grouping-rule/{ruleId}
```
---
### Enable/Disable Grouping Rule
```http
POST /api/project/{projectId}/alert-grouping-rule/{ruleId}/enable
POST /api/project/{projectId}/alert-grouping-rule/{ruleId}/disable
```
---
### Test Grouping Rule
```http
POST /api/project/{projectId}/alert-grouping-rule/{ruleId}/test
```
**Request Body:**
```json
{
"alertIds": ["alert-id-1", "alert-id-2", "alert-id-3"]
}
```
**Response:**
```json
{
"matchedAlerts": [
{ "_id": "alert-id-1", "title": "MySQL timeout", "wouldMatch": true },
{ "_id": "alert-id-2", "title": "API error", "wouldMatch": false },
{ "_id": "alert-id-3", "title": "PostgreSQL error", "wouldMatch": true }
],
"wouldCreateEpisodes": 1,
"groupingPreview": [
{
"episodeTitle": "Critical - Database Issues",
"alerts": ["alert-id-1", "alert-id-3"]
}
]
}
```
---
## Existing Alert API Changes
### Alert Response Enhancement
The existing Alert response will include episode information:
```json
{
"_id": "alert-id",
"alertNumber": 127,
"title": "MySQL connection pool exhausted",
"episode": {
"_id": "episode-id",
"episodeNumber": 42,
"title": "Database Connectivity Issues"
},
"fingerprint": "abc123...",
"duplicateCount": 5
}
```
### Filter Alerts by Episode
```http
GET /api/project/{projectId}/alert?episodeId={episodeId}
```
### Get Ungrouped Alerts
```http
GET /api/project/{projectId}/alert?episodeId=null
```
---
## API Implementation Notes
### Permissions
| Endpoint | Required Permission |
|----------|---------------------|
| GET episodes | `ProjectMember` |
| Create/Update/Delete episodes | `ProjectAdmin` |
| Acknowledge/Resolve episodes | `ProjectMember` |
| GET grouping rules | `ProjectMember` |
| Create/Update/Delete grouping rules | `ProjectAdmin` |
### Error Responses
```json
{
"error": {
"code": "EPISODE_NOT_FOUND",
"message": "Episode with ID xxx not found"
}
}
```
Common error codes:
- `EPISODE_NOT_FOUND` - Episode doesn't exist
- `ALERT_NOT_FOUND` - Alert doesn't exist
- `ALERT_ALREADY_IN_EPISODE` - Alert is already part of an episode
- `CANNOT_MERGE_RESOLVED` - Cannot merge resolved episodes
- `INVALID_GROUPING_CONFIG` - Invalid grouping rule configuration
### Rate Limiting
Standard API rate limits apply. Batch operations (merge, bulk add) count as multiple operations.
---
## Implementation Checklist
### Episode API
- [ ] GET /alert-episode (list)
- [ ] GET /alert-episode/:id (details)
- [ ] POST /alert-episode (create)
- [ ] PUT /alert-episode/:id (update)
- [ ] DELETE /alert-episode/:id (delete)
- [ ] POST /alert-episode/:id/acknowledge
- [ ] POST /alert-episode/:id/resolve
- [ ] GET /alert-episode/:id/alerts
- [ ] POST /alert-episode/:id/add-alert
- [ ] POST /alert-episode/:id/remove-alert
- [ ] POST /alert-episode/merge
- [ ] POST /alert-episode/:id/split
- [ ] GET /alert-episode/:id/timeline
### Grouping Rule API
- [ ] GET /alert-grouping-rule (list)
- [ ] GET /alert-grouping-rule/:id (details)
- [ ] POST /alert-grouping-rule (create)
- [ ] PUT /alert-grouping-rule/:id (update)
- [ ] DELETE /alert-grouping-rule/:id (delete)
- [ ] POST /alert-grouping-rule/:id/enable
- [ ] POST /alert-grouping-rule/:id/disable
- [ ] POST /alert-grouping-rule/:id/test
### Alert API Updates
- [ ] Add episode field to alert response
- [ ] Add episodeId filter to alert list
- [ ] Add fingerprint field to alert response

View File

@@ -1,669 +0,0 @@
# UI Implementation for Alert Grouping
## Overview
This document details the frontend components and pages required for Alert Grouping / Episodes functionality.
## Navigation Structure
```
Dashboard
├── Alerts
│ ├── All Alerts (existing)
│ └── Episodes (NEW)
└── Settings
├── Alerts
│ ├── Alert States (existing)
│ ├── Alert Severities (existing)
│ └── Grouping Rules (NEW)
```
---
## Pages to Create
### 1. Episodes List Page
**File Location:** `/Dashboard/src/Pages/Alerts/Episodes.tsx`
**Route:** `/dashboard/:projectId/alerts/episodes`
**Wireframe:**
```
┌─────────────────────────────────────────────────────────────────────────────────────┐
│ Alerts > Episodes [+ Create Episode] │
├─────────────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌────────┬──────────────┬────────────┬───────┐ ┌─────────────────────────────┐ │
│ │ Active │ Acknowledged │ Resolved │ All │ │ 🔍 Search episodes... │ │
│ │ (5) │ (2) │ (48) │ (55) │ └─────────────────────────────┘ │
│ └────────┴──────────────┴────────────┴───────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────────────────┐ │
│ │ ● EP-42 Database Connectivity Issues 🔴 Critical │ │
│ │ ┌─────────────────────────────────────────────────────────────────────────┐ │ │
│ │ │ 15 alerts │ 3 monitors │ Started 10 min ago │ Last activity: 2 min ago │ │ │
│ │ └─────────────────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ │ Preview: │ │
│ │ • Alert #123: MySQL connection timeout on web-server-1 │ │
│ │ • Alert #124: MySQL connection timeout on web-server-2 │ │
│ │ • Alert #125: PostgreSQL connection refused on api-server │ │
│ │ └── +12 more alerts │ │
│ │ │ │
│ │ Rule: "Group database alerts within 5 min" [Acknowledge] [Resolve] │ │
│ └─────────────────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────────────────┐ │
│ │ ● EP-41 High CPU Utilization 🟠 High │ │
│ │ ... │ │
│ └─────────────────────────────────────────────────────────────────────────────────┘ │
│ │
│ [1] [2] [3] ... [Next →] │
└─────────────────────────────────────────────────────────────────────────────────────┘
```
**Implementation:**
```typescript
// /Dashboard/src/Pages/Alerts/Episodes.tsx
import React, { FunctionComponent, ReactElement } from 'react';
import PageComponentProps from '../PageComponentProps';
import ModelTable from 'Common/UI/Components/ModelTable/ModelTable';
import AlertEpisode from 'Common/Models/DatabaseModels/AlertEpisode';
import FieldType from 'Common/UI/Components/Types/FieldType';
import Navigation from 'Common/UI/Utils/Navigation';
import DashboardNavigation from '../../Utils/Navigation';
import AlertSeverity from 'Common/Models/DatabaseModels/AlertSeverity';
import AlertState from 'Common/Models/DatabaseModels/AlertState';
import Pill from 'Common/UI/Components/Pill/Pill';
import { Black } from 'Common/Types/BrandColors';
const EpisodesPage: FunctionComponent<PageComponentProps> = (
props: PageComponentProps
): ReactElement => {
return (
<ModelTable<AlertEpisode>
modelType={AlertEpisode}
id="episodes-table"
isDeleteable={true}
isEditable={false}
isCreateable={true}
isViewable={true}
name="Episodes"
query={{
projectId: DashboardNavigation.getProjectId()!,
}}
cardProps={{
title: 'Episodes',
description:
'Episodes group related alerts together for easier management.',
}}
selectMoreFields={{
alertCount: true,
uniqueMonitorCount: true,
startedAt: true,
lastActivityAt: true,
}}
columns={[
{
field: {
episodeNumber: true,
},
title: 'Episode',
type: FieldType.Text,
getElement: (item: AlertEpisode): ReactElement => {
return (
<span className="font-medium">
EP-{item.episodeNumber}
</span>
);
},
},
{
field: {
title: true,
},
title: 'Title',
type: FieldType.Text,
},
{
field: {
currentAlertState: {
name: true,
color: true,
},
},
title: 'State',
type: FieldType.Entity,
getElement: (item: AlertEpisode): ReactElement => {
if (!item.currentAlertState) {
return <></>;
}
return (
<Pill
text={item.currentAlertState.name || ''}
color={item.currentAlertState.color || Black}
/>
);
},
},
{
field: {
alertSeverity: {
name: true,
color: true,
},
},
title: 'Severity',
type: FieldType.Entity,
getElement: (item: AlertEpisode): ReactElement => {
if (!item.alertSeverity) {
return <></>;
}
return (
<Pill
text={item.alertSeverity.name || ''}
color={item.alertSeverity.color || Black}
/>
);
},
},
{
field: {
alertCount: true,
},
title: 'Alerts',
type: FieldType.Number,
},
{
field: {
lastActivityAt: true,
},
title: 'Last Activity',
type: FieldType.DateTime,
},
]}
filters={[
{
field: {
currentAlertState: {
_id: true,
},
},
title: 'State',
type: FieldType.Entity,
filterEntityType: AlertState,
filterQuery: {
projectId: DashboardNavigation.getProjectId()!,
},
},
{
field: {
alertSeverity: {
_id: true,
},
},
title: 'Severity',
type: FieldType.Entity,
filterEntityType: AlertSeverity,
filterQuery: {
projectId: DashboardNavigation.getProjectId()!,
},
},
]}
onViewPage={(item: AlertEpisode): void => {
Navigation.navigate(
DashboardNavigation.getAlertEpisodeViewRoute(item._id!)
);
}}
/>
);
};
export default EpisodesPage;
```
---
### 2. Episode Detail Page
**File Location:** `/Dashboard/src/Pages/Alerts/EpisodeView/Index.tsx`
**Route:** `/dashboard/:projectId/alerts/episodes/:episodeId`
**Wireframe:**
```
┌─────────────────────────────────────────────────────────────────────────────────────┐
│ ← Episodes EP-42: Database Connectivity Issues │
├─────────────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────────────────────────────────┐ ┌──────────────────────────────┐ │
│ │ Status │ 🔴 Active │ │ Actions │ │
│ │ Severity │ Critical │ │ ┌────────────────────────┐ │ │
│ │ Started │ Jan 20, 2026 10:45 AM │ │ │ [Acknowledge] │ │ │
│ │ Last Activity │ 2 min ago │ │ │ [Resolve] │ │ │
│ │ Alert Count │ 15 │ │ │ [Add Alert] │ │ │
│ │ Monitors │ 3 │ │ │ [Merge Episodes] │ │ │
│ └──────────────────────────────────────────────┘ └──────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────────────────┐ │
│ │ Tabs: [Overview] [Alerts (15)] [Timeline] [Settings] │ │
│ └─────────────────────────────────────────────────────────────────────────────────┘ │
│ │
│ OVERVIEW TAB: │
│ ┌─────────────────────────────────────────────────────────────────────────────────┐ │
│ │ Description [Edit] │ │
│ │ Multiple database connection failures affecting production services │ │
│ └─────────────────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────────────────┐ │
│ │ Assigned To [Edit] │ │
│ │ 👤 John Smith (DBA Team) │ │
│ └─────────────────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────────────────┐ │
│ │ Root Cause Analysis [Edit] │ │
│ │ Database connection pool exhausted due to connection leak in payment service │ │
│ └─────────────────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────────────────┐ │
│ │ Grouping Rule │ │
│ │ "Database alerts - 5min" (Time Window: 5 minutes) │ │
│ └─────────────────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────────────┘
```
**Sub-pages:**
| Route | Component | Description |
|-------|-----------|-------------|
| `/episodes/:id` | Overview | Episode details, owners, root cause |
| `/episodes/:id/alerts` | Alerts | List of alerts in episode |
| `/episodes/:id/timeline` | Timeline | Episode activity timeline |
| `/episodes/:id/settings` | Settings | Delete episode |
---
### 3. Episode Alerts Tab
**File Location:** `/Dashboard/src/Pages/Alerts/EpisodeView/Alerts.tsx`
```
┌─────────────────────────────────────────────────────────────────────────────────────┐
│ ALERTS TAB: [+ Add Alert] │
├─────────────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────┬──────────────────────────────────────────────┬──────────┬───────┬──────┐ │
│ │ ID │ Title │ Monitor │ State │ ··· │ │
│ ├───────┼──────────────────────────────────────────────┼──────────┼───────┼──────┤ │
│ │ #127 │ MySQL connection pool exhausted │ mysql-01 │ ● Act │ [x] │ │
│ │ #126 │ MySQL connection timeout │ web-02 │ ● Act │ [x] │ │
│ │ #125 │ PostgreSQL connection refused │ api-01 │ ✓ Res │ [x] │ │
│ │ #124 │ MySQL connection timeout │ web-02 │ ● Act │ [x] │ │
│ │ #123 │ MySQL connection timeout │ web-01 │ ● Act │ [x] │ │
│ └───────┴──────────────────────────────────────────────┴──────────┴───────┴──────┘ │
│ │
│ Note: [x] = Remove from episode button │
│ │
└─────────────────────────────────────────────────────────────────────────────────────┘
```
---
### 4. Grouping Rules Page
**File Location:** `/Dashboard/src/Pages/Settings/AlertGroupingRules.tsx`
**Route:** `/dashboard/:projectId/settings/alert-grouping-rules`
**Wireframe:**
```
┌─────────────────────────────────────────────────────────────────────────────────────┐
│ Settings > Alert Grouping Rules [+ Create Rule] │
├─────────────────────────────────────────────────────────────────────────────────────┤
│ │
│ Grouping rules automatically combine related alerts into Episodes. │
│ │
│ ┌─────────────────────────────────────────────────────────────────────────────────┐ │
│ │ ✅ Database Alerts - 5 minute window Priority: 1 │ │
│ │ ────────────────────────────────────────────────────────────────────────────── │ │
│ │ Type: Time Window (5 minutes) │ │
│ │ Matches: Monitors with label "database" │ │
│ │ Episodes created: 23 │ Alerts grouped: 156 │ │
│ │ [Edit] [Delete]│ │
│ └─────────────────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────────────────┐ │
│ │ ❌ Smart Grouping (Disabled) Priority: 2 │ │
│ │ ────────────────────────────────────────────────────────────────────────────── │ │
│ │ Type: Smart (80% similarity) │ │
│ │ Matches: All critical alerts │ │
│ │ [Enable] [Edit] [Delete]│ │
│ └─────────────────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────────────┘
```
---
### 5. Create/Edit Grouping Rule Form
**File Location:** `/Dashboard/src/Pages/Settings/AlertGroupingRuleView/Index.tsx`
**Wireframe:**
```
┌─────────────────────────────────────────────────────────────────────────────────────┐
│ Create Grouping Rule │
├─────────────────────────────────────────────────────────────────────────────────────┤
│ │
│ BASIC INFORMATION │
│ ───────────────────────────────────────────────────────────────────────────────── │
│ │
│ Rule Name * │
│ ┌─────────────────────────────────────────────────────────────────────────────────┐ │
│ │ Database Alerts - 5 minute window │ │
│ └─────────────────────────────────────────────────────────────────────────────────┘ │
│ │
│ Description │
│ ┌─────────────────────────────────────────────────────────────────────────────────┐ │
│ │ Groups database-related alerts within 5 minutes │ │
│ └─────────────────────────────────────────────────────────────────────────────────┘ │
│ │
│ Priority (lower = evaluated first) │
│ ┌──────────┐ │
│ │ 1 │ │
│ └──────────┘ │
│ │
│ MATCHING CRITERIA │
│ ───────────────────────────────────────────────────────────────────────────────── │
│ Which alerts should this rule apply to? │
│ │
│ Severities (optional) │
│ ┌─────────────────────────────────────────────────────────────────────────────────┐ │
│ │ [Critical ×] [High ×] [+ Add] │ │
│ └─────────────────────────────────────────────────────────────────────────────────┘ │
│ │
│ Labels (optional) │
│ ┌─────────────────────────────────────────────────────────────────────────────────┐ │
│ │ [database ×] [+ Add] │ │
│ └─────────────────────────────────────────────────────────────────────────────────┘ │
│ │
│ Title Pattern (regex, optional) │
│ ┌─────────────────────────────────────────────────────────────────────────────────┐ │
│ │ .*(connection|database|mysql|postgres).* │ │
│ └─────────────────────────────────────────────────────────────────────────────────┘ │
│ │
│ GROUPING METHOD │
│ ───────────────────────────────────────────────────────────────────────────────── │
│ │
│ Grouping Type * │
│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │
│ │ ● Time Window │ │ ○ Field-Based │ │ ○ Smart (Beta) │ │
│ └──────────────────┘ └──────────────────┘ └──────────────────┘ │
│ │
│ Time Window (minutes) * │
│ ┌──────────┐ │
│ │ 5 │ Alerts arriving within this window will be grouped together. │
│ └──────────┘ │
│ │
│ EPISODE SETTINGS │
│ ───────────────────────────────────────────────────────────────────────────────── │
│ │
│ Episode Title Template │
│ ┌─────────────────────────────────────────────────────────────────────────────────┐ │
│ │ {{severity}} - Database Issues │ │
│ └─────────────────────────────────────────────────────────────────────────────────┘ │
│ Available: {{severity}}, {{monitor}}, {{alertCount}} │
│ │
│ ☑ Auto-resolve episode when all alerts are resolved │
│ │
│ Break episode after inactive for (minutes) │
│ ┌──────────┐ │
│ │ 60 │ │
│ └──────────┘ │
│ │
│ [Cancel] [Test Rule] [Save] │
│ │
└─────────────────────────────────────────────────────────────────────────────────────┘
```
---
## Existing Page Modifications
### 1. Alerts Table Enhancement
Add Episode column to the existing Alerts table.
**File:** `/Dashboard/src/Pages/Alerts/View/Index.tsx`
```typescript
// Add to columns array:
{
field: {
episode: {
_id: true,
episodeNumber: true,
title: true,
},
},
title: 'Episode',
type: FieldType.Entity,
getElement: (item: Alert): ReactElement => {
if (!item.episode) {
return <span className="text-gray-400"></span>;
}
return (
<Link
to={DashboardNavigation.getAlertEpisodeViewRoute(
item.episode._id!
)}
>
EP-{item.episode.episodeNumber}
</Link>
);
},
},
```
### 2. Alert Detail Page Enhancement
Show episode membership on alert detail page.
**File:** `/Dashboard/src/Pages/Alerts/AlertView/Index.tsx`
Add a card showing:
- Episode badge (if part of episode)
- Link to episode detail
- Button to remove from episode
---
## Components to Create
### 1. EpisodeCard Component
**File:** `/Dashboard/src/Components/Episode/EpisodeCard.tsx`
Reusable card for displaying episode summary.
```typescript
interface EpisodeCardProps {
episode: AlertEpisode;
showAlertPreview?: boolean;
onAcknowledge?: () => void;
onResolve?: () => void;
}
```
### 2. EpisodeBadge Component
**File:** `/Dashboard/src/Components/Episode/EpisodeBadge.tsx`
Small badge showing episode number and link.
```typescript
interface EpisodeBadgeProps {
episodeNumber: number;
episodeId: ObjectID;
}
```
### 3. AddAlertToEpisodeModal Component
**File:** `/Dashboard/src/Components/Episode/AddAlertToEpisodeModal.tsx`
Modal for manually adding alerts to an episode.
### 4. MergeEpisodesModal Component
**File:** `/Dashboard/src/Components/Episode/MergeEpisodesModal.tsx`
Modal for merging multiple episodes.
### 5. GroupingRuleForm Component
**File:** `/Dashboard/src/Components/GroupingRule/GroupingRuleForm.tsx`
Form for creating/editing grouping rules with:
- Match criteria builder
- Grouping type selector
- Episode config options
---
## Routing Configuration
Add to `/Dashboard/src/Routes/AlertRoutes.tsx`:
```typescript
// Episode routes
{
path: '/dashboard/:projectId/alerts/episodes',
component: EpisodesPage,
},
{
path: '/dashboard/:projectId/alerts/episodes/:episodeId',
component: EpisodeViewLayout,
children: [
{
path: '',
component: EpisodeOverview,
},
{
path: 'alerts',
component: EpisodeAlerts,
},
{
path: 'timeline',
component: EpisodeTimeline,
},
{
path: 'settings',
component: EpisodeSettings,
},
],
},
```
Add to `/Dashboard/src/Routes/SettingsRoutes.tsx`:
```typescript
// Grouping rule routes
{
path: '/dashboard/:projectId/settings/alert-grouping-rules',
component: AlertGroupingRulesPage,
},
{
path: '/dashboard/:projectId/settings/alert-grouping-rules/:ruleId',
component: AlertGroupingRuleViewLayout,
},
```
---
## Navigation Helper Updates
Add to `/Dashboard/src/Utils/Navigation.ts`:
```typescript
public static getAlertEpisodesRoute(projectId?: ObjectID): Route {
return new Route(`/dashboard/${projectId?.toString()}/alerts/episodes`);
}
public static getAlertEpisodeViewRoute(episodeId: ObjectID): Route {
return new Route(
`/dashboard/${this.getProjectId()?.toString()}/alerts/episodes/${episodeId.toString()}`
);
}
public static getAlertGroupingRulesRoute(): Route {
return new Route(
`/dashboard/${this.getProjectId()?.toString()}/settings/alert-grouping-rules`
);
}
```
---
## Sidebar Menu Updates
Add to Alerts section in `/Dashboard/src/Components/Sidebar/Sidebar.tsx`:
```typescript
{
title: 'Episodes',
route: RouteMap.AlertEpisodes,
icon: IconProp.Layers,
}
```
Add to Settings > Alerts section:
```typescript
{
title: 'Grouping Rules',
route: RouteMap.AlertGroupingRules,
icon: IconProp.Layers,
}
```
---
## Implementation Checklist
### Pages
- [ ] Episodes list page
- [ ] Episode detail page (overview)
- [ ] Episode alerts tab
- [ ] Episode timeline tab
- [ ] Episode settings tab
- [ ] Grouping rules list page
- [ ] Grouping rule detail/edit page
### Components
- [ ] EpisodeCard component
- [ ] EpisodeBadge component
- [ ] AddAlertToEpisodeModal
- [ ] MergeEpisodesModal
- [ ] GroupingRuleForm
- [ ] GroupingTypeSelector
### Existing Page Updates
- [ ] Add Episode column to Alerts table
- [ ] Add Episode card to Alert detail page
- [ ] Add sidebar navigation items
- [ ] Update route configuration
### Styling
- [ ] Episode card styles
- [ ] Episode badge styles
- [ ] Grouping rule form styles
- [ ] Timeline component styles

View File

@@ -1,888 +0,0 @@
# Migration & Rollout Plan for Alert Grouping
## Overview
This document outlines the database migrations, feature flags, and rollout strategy for Alert Grouping / Episodes functionality.
## Database Migrations
### Migration 1: Create AlertGroupingRule Table
**File:** `/Common/Server/Infrastructure/Postgres/SchemaMigrations/XXXX-CreateAlertGroupingRule.ts`
```typescript
import { MigrationInterface, QueryRunner, Table, TableIndex } from 'typeorm';
export class CreateAlertGroupingRule implements MigrationInterface {
public async up(queryRunner: QueryRunner): Promise<void> {
await queryRunner.createTable(
new Table({
name: 'AlertGroupingRule',
columns: [
{
name: '_id',
type: 'uuid',
isPrimary: true,
default: 'uuid_generate_v4()',
},
{
name: 'projectId',
type: 'uuid',
isNullable: false,
},
{
name: 'name',
type: 'varchar',
length: '500',
isNullable: false,
},
{
name: 'description',
type: 'text',
isNullable: true,
},
{
name: 'isEnabled',
type: 'boolean',
default: true,
},
{
name: 'matchCriteria',
type: 'jsonb',
isNullable: true,
},
{
name: 'groupingConfig',
type: 'jsonb',
isNullable: false,
},
{
name: 'episodeConfig',
type: 'jsonb',
isNullable: false,
},
{
name: 'priority',
type: 'integer',
default: 100,
},
{
name: 'createdAt',
type: 'timestamp',
default: 'CURRENT_TIMESTAMP',
},
{
name: 'updatedAt',
type: 'timestamp',
default: 'CURRENT_TIMESTAMP',
},
{
name: 'deletedAt',
type: 'timestamp',
isNullable: true,
},
],
}),
true
);
await queryRunner.createIndex(
'AlertGroupingRule',
new TableIndex({
name: 'idx_grouping_rule_project_enabled',
columnNames: ['projectId', 'isEnabled', 'priority'],
})
);
}
public async down(queryRunner: QueryRunner): Promise<void> {
await queryRunner.dropTable('AlertGroupingRule');
}
}
```
---
### Migration 2: Create AlertEpisode Table
**File:** `/Common/Server/Infrastructure/Postgres/SchemaMigrations/XXXX-CreateAlertEpisode.ts`
```typescript
import { MigrationInterface, QueryRunner, Table, TableIndex, TableForeignKey } from 'typeorm';
export class CreateAlertEpisode implements MigrationInterface {
public async up(queryRunner: QueryRunner): Promise<void> {
await queryRunner.createTable(
new Table({
name: 'AlertEpisode',
columns: [
{
name: '_id',
type: 'uuid',
isPrimary: true,
default: 'uuid_generate_v4()',
},
{
name: 'projectId',
type: 'uuid',
isNullable: false,
},
{
name: 'episodeNumber',
type: 'integer',
isNullable: false,
},
{
name: 'title',
type: 'varchar',
length: '500',
isNullable: false,
},
{
name: 'description',
type: 'text',
isNullable: true,
},
{
name: 'groupingRuleId',
type: 'uuid',
isNullable: true,
},
{
name: 'currentAlertStateId',
type: 'uuid',
isNullable: true,
},
{
name: 'alertSeverityId',
type: 'uuid',
isNullable: true,
},
{
name: 'startedAt',
type: 'timestamp',
isNullable: false,
},
{
name: 'lastActivityAt',
type: 'timestamp',
isNullable: false,
},
{
name: 'acknowledgedAt',
type: 'timestamp',
isNullable: true,
},
{
name: 'resolvedAt',
type: 'timestamp',
isNullable: true,
},
{
name: 'alertCount',
type: 'integer',
default: 0,
},
{
name: 'uniqueMonitorCount',
type: 'integer',
default: 0,
},
{
name: 'rootCause',
type: 'text',
isNullable: true,
},
{
name: 'createdAt',
type: 'timestamp',
default: 'CURRENT_TIMESTAMP',
},
{
name: 'updatedAt',
type: 'timestamp',
default: 'CURRENT_TIMESTAMP',
},
{
name: 'deletedAt',
type: 'timestamp',
isNullable: true,
},
],
}),
true
);
// Indexes
await queryRunner.createIndex(
'AlertEpisode',
new TableIndex({
name: 'idx_episode_project_state',
columnNames: ['projectId', 'currentAlertStateId', 'lastActivityAt'],
})
);
await queryRunner.createIndex(
'AlertEpisode',
new TableIndex({
name: 'idx_episode_grouping_rule',
columnNames: ['projectId', 'groupingRuleId', 'currentAlertStateId'],
})
);
// Foreign keys
await queryRunner.createForeignKey(
'AlertEpisode',
new TableForeignKey({
columnNames: ['projectId'],
referencedTableName: 'Project',
referencedColumnNames: ['_id'],
onDelete: 'CASCADE',
})
);
await queryRunner.createForeignKey(
'AlertEpisode',
new TableForeignKey({
columnNames: ['groupingRuleId'],
referencedTableName: 'AlertGroupingRule',
referencedColumnNames: ['_id'],
onDelete: 'SET NULL',
})
);
await queryRunner.createForeignKey(
'AlertEpisode',
new TableForeignKey({
columnNames: ['currentAlertStateId'],
referencedTableName: 'AlertState',
referencedColumnNames: ['_id'],
onDelete: 'SET NULL',
})
);
await queryRunner.createForeignKey(
'AlertEpisode',
new TableForeignKey({
columnNames: ['alertSeverityId'],
referencedTableName: 'AlertSeverity',
referencedColumnNames: ['_id'],
onDelete: 'SET NULL',
})
);
}
public async down(queryRunner: QueryRunner): Promise<void> {
await queryRunner.dropTable('AlertEpisode');
}
}
```
---
### Migration 3: Create AlertEpisodeMember Table
**File:** `/Common/Server/Infrastructure/Postgres/SchemaMigrations/XXXX-CreateAlertEpisodeMember.ts`
```typescript
import { MigrationInterface, QueryRunner, Table, TableIndex, TableForeignKey } from 'typeorm';
export class CreateAlertEpisodeMember implements MigrationInterface {
public async up(queryRunner: QueryRunner): Promise<void> {
await queryRunner.createTable(
new Table({
name: 'AlertEpisodeMember',
columns: [
{
name: '_id',
type: 'uuid',
isPrimary: true,
default: 'uuid_generate_v4()',
},
{
name: 'projectId',
type: 'uuid',
isNullable: false,
},
{
name: 'episodeId',
type: 'uuid',
isNullable: false,
},
{
name: 'alertId',
type: 'uuid',
isNullable: false,
},
{
name: 'addedBy',
type: 'varchar',
length: '50',
isNullable: false,
},
{
name: 'addedAt',
type: 'timestamp',
isNullable: false,
},
{
name: 'groupingRuleId',
type: 'uuid',
isNullable: true,
},
{
name: 'similarityScore',
type: 'decimal',
precision: 5,
scale: 4,
isNullable: true,
},
{
name: 'createdAt',
type: 'timestamp',
default: 'CURRENT_TIMESTAMP',
},
{
name: 'deletedAt',
type: 'timestamp',
isNullable: true,
},
],
}),
true
);
// Indexes
await queryRunner.createIndex(
'AlertEpisodeMember',
new TableIndex({
name: 'idx_episode_member_episode',
columnNames: ['episodeId', 'addedAt'],
})
);
await queryRunner.createIndex(
'AlertEpisodeMember',
new TableIndex({
name: 'idx_episode_member_alert',
columnNames: ['alertId'],
})
);
await queryRunner.createIndex(
'AlertEpisodeMember',
new TableIndex({
name: 'idx_episode_member_unique',
columnNames: ['episodeId', 'alertId'],
isUnique: true,
})
);
// Foreign keys
await queryRunner.createForeignKey(
'AlertEpisodeMember',
new TableForeignKey({
columnNames: ['projectId'],
referencedTableName: 'Project',
referencedColumnNames: ['_id'],
onDelete: 'CASCADE',
})
);
await queryRunner.createForeignKey(
'AlertEpisodeMember',
new TableForeignKey({
columnNames: ['episodeId'],
referencedTableName: 'AlertEpisode',
referencedColumnNames: ['_id'],
onDelete: 'CASCADE',
})
);
await queryRunner.createForeignKey(
'AlertEpisodeMember',
new TableForeignKey({
columnNames: ['alertId'],
referencedTableName: 'Alert',
referencedColumnNames: ['_id'],
onDelete: 'CASCADE',
})
);
}
public async down(queryRunner: QueryRunner): Promise<void> {
await queryRunner.dropTable('AlertEpisodeMember');
}
}
```
---
### Migration 4: Add Episode Fields to Alert Table
**File:** `/Common/Server/Infrastructure/Postgres/SchemaMigrations/XXXX-AddEpisodeFieldsToAlert.ts`
```typescript
import { MigrationInterface, QueryRunner, TableColumn, TableIndex, TableForeignKey } from 'typeorm';
export class AddEpisodeFieldsToAlert implements MigrationInterface {
public async up(queryRunner: QueryRunner): Promise<void> {
// Add episodeId column
await queryRunner.addColumn(
'Alert',
new TableColumn({
name: 'episodeId',
type: 'uuid',
isNullable: true,
})
);
// Add fingerprint column
await queryRunner.addColumn(
'Alert',
new TableColumn({
name: 'fingerprint',
type: 'varchar',
length: '64',
isNullable: true,
})
);
// Add duplicateCount column
await queryRunner.addColumn(
'Alert',
new TableColumn({
name: 'duplicateCount',
type: 'integer',
default: 0,
})
);
// Add lastDuplicateAt column
await queryRunner.addColumn(
'Alert',
new TableColumn({
name: 'lastDuplicateAt',
type: 'timestamp',
isNullable: true,
})
);
// Create indexes
await queryRunner.createIndex(
'Alert',
new TableIndex({
name: 'idx_alert_episode',
columnNames: ['episodeId'],
where: '"episodeId" IS NOT NULL',
})
);
await queryRunner.createIndex(
'Alert',
new TableIndex({
name: 'idx_alert_fingerprint',
columnNames: ['projectId', 'fingerprint'],
where: '"fingerprint" IS NOT NULL',
})
);
// Create foreign key
await queryRunner.createForeignKey(
'Alert',
new TableForeignKey({
name: 'fk_alert_episode',
columnNames: ['episodeId'],
referencedTableName: 'AlertEpisode',
referencedColumnNames: ['_id'],
onDelete: 'SET NULL',
})
);
}
public async down(queryRunner: QueryRunner): Promise<void> {
await queryRunner.dropForeignKey('Alert', 'fk_alert_episode');
await queryRunner.dropIndex('Alert', 'idx_alert_fingerprint');
await queryRunner.dropIndex('Alert', 'idx_alert_episode');
await queryRunner.dropColumn('Alert', 'lastDuplicateAt');
await queryRunner.dropColumn('Alert', 'duplicateCount');
await queryRunner.dropColumn('Alert', 'fingerprint');
await queryRunner.dropColumn('Alert', 'episodeId');
}
}
```
---
### Migration 5: Create Episode Join Tables
**File:** `/Common/Server/Infrastructure/Postgres/SchemaMigrations/XXXX-CreateEpisodeJoinTables.ts`
```typescript
import { MigrationInterface, QueryRunner, Table, TableForeignKey } from 'typeorm';
export class CreateEpisodeJoinTables implements MigrationInterface {
public async up(queryRunner: QueryRunner): Promise<void> {
// AlertEpisodeOwnerUser join table
await queryRunner.createTable(
new Table({
name: 'AlertEpisodeOwnerUser',
columns: [
{
name: 'episodeId',
type: 'uuid',
isPrimary: true,
},
{
name: 'userId',
type: 'uuid',
isPrimary: true,
},
],
}),
true
);
await queryRunner.createForeignKey(
'AlertEpisodeOwnerUser',
new TableForeignKey({
columnNames: ['episodeId'],
referencedTableName: 'AlertEpisode',
referencedColumnNames: ['_id'],
onDelete: 'CASCADE',
})
);
await queryRunner.createForeignKey(
'AlertEpisodeOwnerUser',
new TableForeignKey({
columnNames: ['userId'],
referencedTableName: 'User',
referencedColumnNames: ['_id'],
onDelete: 'CASCADE',
})
);
// AlertEpisodeOwnerTeam join table
await queryRunner.createTable(
new Table({
name: 'AlertEpisodeOwnerTeam',
columns: [
{
name: 'episodeId',
type: 'uuid',
isPrimary: true,
},
{
name: 'teamId',
type: 'uuid',
isPrimary: true,
},
],
}),
true
);
await queryRunner.createForeignKey(
'AlertEpisodeOwnerTeam',
new TableForeignKey({
columnNames: ['episodeId'],
referencedTableName: 'AlertEpisode',
referencedColumnNames: ['_id'],
onDelete: 'CASCADE',
})
);
await queryRunner.createForeignKey(
'AlertEpisodeOwnerTeam',
new TableForeignKey({
columnNames: ['teamId'],
referencedTableName: 'Team',
referencedColumnNames: ['_id'],
onDelete: 'CASCADE',
})
);
// AlertEpisodeLabel join table
await queryRunner.createTable(
new Table({
name: 'AlertEpisodeLabel',
columns: [
{
name: 'episodeId',
type: 'uuid',
isPrimary: true,
},
{
name: 'labelId',
type: 'uuid',
isPrimary: true,
},
],
}),
true
);
await queryRunner.createForeignKey(
'AlertEpisodeLabel',
new TableForeignKey({
columnNames: ['episodeId'],
referencedTableName: 'AlertEpisode',
referencedColumnNames: ['_id'],
onDelete: 'CASCADE',
})
);
await queryRunner.createForeignKey(
'AlertEpisodeLabel',
new TableForeignKey({
columnNames: ['labelId'],
referencedTableName: 'Label',
referencedColumnNames: ['_id'],
onDelete: 'CASCADE',
})
);
}
public async down(queryRunner: QueryRunner): Promise<void> {
await queryRunner.dropTable('AlertEpisodeLabel');
await queryRunner.dropTable('AlertEpisodeOwnerTeam');
await queryRunner.dropTable('AlertEpisodeOwnerUser');
}
}
```
---
## Feature Flags
### Project-Level Settings
Add to Project model or create AlertGroupingSettings:
```typescript
interface AlertGroupingSettings {
// Master switch
groupingEnabled: boolean;
// Auto-create episodes for new alerts
autoCreateEpisodes: boolean;
// Default time window for grouping (minutes)
defaultTimeWindowMinutes: number;
}
```
### Implementation
```typescript
// /Common/Server/Services/AlertGroupingSettingsService.ts
export default class AlertGroupingSettingsService {
public static async isGroupingEnabled(projectId: ObjectID): Promise<boolean> {
const settings = await ProjectService.findOneById({
id: projectId,
select: { alertGroupingEnabled: true },
props: { isRoot: true },
});
return settings?.alertGroupingEnabled ?? false;
}
}
```
### Usage in GroupingEngine
```typescript
// In GroupingEngine.processAlert():
const isEnabled = await AlertGroupingSettingsService.isGroupingEnabled(projectId);
if (!isEnabled) {
return { shouldGroup: false, isNewEpisode: false };
}
```
---
## Rollout Strategy
### Phase 1: Internal Alpha
**Duration:** 1 week
**Scope:**
- Enable for internal test projects only
- Feature flag: `ALERT_GROUPING_INTERNAL_ONLY=true`
**Validation:**
- Verify migrations run successfully
- Test basic grouping flow
- Check performance metrics
---
### Phase 2: Beta (Opt-in)
**Duration:** 2 weeks
**Scope:**
- Available to all projects but disabled by default
- Users must explicitly enable in Settings
- Show "Beta" badge on Episodes page
**Communication:**
- In-app announcement
- Documentation published
- Support team briefed
**Monitoring:**
- Episode creation rate
- Grouping accuracy feedback
- Performance metrics
---
### Phase 3: General Availability
**Duration:** Ongoing
**Scope:**
- Enabled by default for new projects
- Existing projects can opt-in via Settings
**Milestones:**
- Remove "Beta" badge
- Enable by default for all new projects
- Provide migration tool for existing alerts
---
## Backward Compatibility
### No Breaking Changes
1. **Existing alerts unchanged** - episodeId is nullable, defaults to null
2. **Existing API unchanged** - new fields added but not required
3. **Opt-in only** - grouping disabled until rules created
### Gradual Adoption
1. Users create grouping rules when ready
2. Only new alerts are grouped (after rule creation)
3. No retroactive grouping unless explicitly triggered
---
## Data Migration (Optional)
### Retroactive Alert Grouping
For users who want to group existing alerts:
```typescript
// /Worker/Jobs/AlertEpisode/RetroactiveGrouping.ts
export async function retroactivelyGroupAlerts(
projectId: ObjectID,
ruleId: ObjectID,
startDate: Date,
endDate: Date
): Promise<void> {
// Get rule
const rule = await AlertGroupingRuleService.findOneById({ id: ruleId });
// Get alerts in date range
const alerts = await AlertService.findBy({
query: {
projectId,
createdAt: QueryHelper.between(startDate, endDate),
episodeId: QueryHelper.isNull(),
},
select: { ... },
props: { isRoot: true },
});
// Group alerts
for (const alert of alerts) {
await GroupingEngine.processAlert(alert, projectId);
}
}
```
This would be triggered via Admin UI or API endpoint.
---
## Rollback Plan
### Database Rollback
If issues discovered, migrations can be rolled back:
```bash
npm run migration:revert
```
### Feature Flag Disable
Immediately disable grouping for all projects:
```bash
# Set environment variable
ALERT_GROUPING_GLOBAL_DISABLE=true
```
### Data Preservation
- Episodes and members remain in database
- Alerts keep episodeId reference
- Can be re-enabled later without data loss
---
## Monitoring & Alerts
### Key Metrics
| Metric | Description | Threshold |
|--------|-------------|-----------|
| `episode_creation_rate` | Episodes created per hour | Monitor for anomalies |
| `grouping_latency_p99` | Time to group an alert | < 50ms |
| `episode_alert_ratio` | Avg alerts per episode | > 2 (effective grouping) |
| `grouping_engine_errors` | Errors in grouping | 0 |
### Dashboards
Create monitoring dashboards for:
- Episode creation over time
- Grouping rule effectiveness
- Performance metrics
- Error rates
---
## Checklist
### Pre-Migration
- [ ] Review migration scripts
- [ ] Test migrations on staging
- [ ] Backup production database
- [ ] Prepare rollback procedure
### Migration
- [ ] Run migrations in order
- [ ] Verify table creation
- [ ] Verify index creation
- [ ] Verify foreign keys
### Post-Migration
- [ ] Deploy updated API
- [ ] Deploy updated Worker
- [ ] Deploy updated Dashboard
- [ ] Enable feature flags for alpha
- [ ] Monitor metrics
### GA Release
- [ ] Remove beta badges
- [ ] Update documentation
- [ ] Enable for new projects
- [ ] Announce to users

View File

@@ -1,117 +0,0 @@
# Alert Grouping / Episodes Implementation Plan
## Overview
This sub-plan details the implementation of Alert Grouping and Episodes functionality for OneUptime. This feature groups related alerts into logical containers called "Episodes" to reduce noise and help operators focus on root causes rather than individual symptoms.
## Documents
| Document | Description |
|----------|-------------|
| [1-DataModels.md](./1-DataModels.md) | Database models and schema definitions |
| [2-Backend.md](./2-Backend.md) | Backend services and grouping engine |
| [3-API.md](./3-API.md) | REST API endpoints |
| [4-UI.md](./4-UI.md) | Frontend components and pages |
| [5-Migration.md](./5-Migration.md) | Database migrations and rollout |
## Feature Summary
### What is an Episode?
An **Episode** is a container that groups related alerts together. Instead of seeing 50 individual "connection timeout" alerts, operators see one episode: "Database Connectivity Issues (50 alerts)".
### Key Capabilities
1. **Automatic Grouping** - Rules-based grouping of alerts into episodes
2. **Time-Window Grouping** - Group alerts occurring within N minutes
3. **Field-Based Grouping** - Group by monitor, severity, labels, etc.
4. **Manual Management** - Merge, split, add/remove alerts from episodes
5. **Episode Lifecycle** - Active → Acknowledged → Resolved states
6. **Root Cause Tracking** - Document root cause analysis per episode
### User Stories
```
As an operator, I want to see related alerts grouped together
so that I can focus on root causes instead of individual symptoms.
As an operator, I want to acknowledge an entire episode at once
so that I don't have to acknowledge each alert individually.
As a team lead, I want to configure grouping rules
so that alerts are automatically organized by our team's workflow.
As an operator, I want to document the root cause of an episode
so that the team can learn from past incidents.
```
## Implementation Phases
### Phase 1: Core Models & Basic Grouping (Week 1-2)
- [ ] Create AlertEpisode model
- [ ] Create AlertEpisodeMember model
- [ ] Create AlertGroupingRule model
- [ ] Implement basic time-window grouping engine
- [ ] Integrate with alert creation flow
### Phase 2: Episode Management (Week 3)
- [ ] Episode state management (acknowledge, resolve)
- [ ] Episode assignment (owners, teams)
- [ ] Episode timeline tracking
- [ ] Manual alert management (add/remove)
### Phase 3: UI - List & Detail Views (Week 4-5)
- [ ] Episodes list page
- [ ] Episode detail page
- [ ] Episode actions (acknowledge, resolve, assign)
- [ ] Alert-to-episode linking in alerts table
### Phase 4: UI - Configuration (Week 6)
- [ ] Grouping rules list page
- [ ] Create/edit grouping rule form
- [ ] Rule testing functionality
- [ ] Episode badge in alerts table
### Phase 5: Advanced Features (Week 7-8)
- [ ] Field-based grouping
- [ ] Episode merge/split functionality
- [ ] Episode notifications
- [ ] Analytics and metrics
## Dependencies
### Existing Components Used
- `Alert` model and `AlertService`
- `AlertState` and `AlertStateTimeline`
- Dashboard routing and layout components
- ModelTable and ModelForm components
- On-call notification system
### New Components Created
- `AlertEpisode` model
- `AlertEpisodeMember` model
- `AlertGroupingRule` model
- `GroupingEngine` service
- Episode UI pages and components
## Success Metrics
| Metric | Target |
|--------|--------|
| Alert-to-episode ratio | 5:1 or higher |
| Episode acknowledgment time | 50% faster than individual alerts |
| User adoption | 80% of projects with grouping rules |
| Processing latency | < 30ms added to alert creation |
## References
- [Parent Plan: AlertEngine.md](../AlertEngine.md)
- [Splunk ITSI Episode Review](https://docs.splunk.com/Documentation/ITSI)
- [PagerDuty Alert Grouping](https://support.pagerduty.com/docs/alert-grouping)

View File

@@ -0,0 +1,61 @@
# Alert Grouping / Episodes - Summary
## What is Alert Grouping?
Alert Grouping is a feature that automatically combines related alerts into logical containers called **Episodes**. Instead of seeing 50 individual "connection timeout" alerts, operators see one episode: "Database Connectivity Issues (50 alerts)".
## Key Capabilities
1. **Automatic Grouping** - Rules-based grouping of alerts into episodes
2. **Time-Window Grouping** - Group alerts occurring within N minutes
3. **Field-Based Grouping** - Group by monitor, monitor custom fields, severity, labels, etc.
4. **Manual Management** - Merge, split, add/remove alerts from episodes
5. **Episode Lifecycle** - Active → Acknowledged → Resolved states. These should be linked to alert states.
6. **Root Cause Tracking** - Document root cause analysis per episode. This is a placeholder field for user to fill out. We can even use Generate with AI to help summarize the episode based on Root Cause of all the alerts in the episode.
7. **Flapping Prevention** - Grace periods before resolution and reopen windows
## Data Models
### Three New Models
| Model | Purpose |
|-------|---------|
| **AlertEpisode** | Container for grouped alerts (title, state, severity, timing, ownership) |
| **AlertEpisodeMember** | Links alerts to episodes with metadata (addedBy, addedAt, similarityScore) |
| **AlertGroupingRule** | Configures automatic grouping behavior (match criteria, grouping config, priority) |
### Alert Model Enhancements
- `episodeId` - Link to parent episode
- `fingerprint` - Hash for deduplication
- `duplicateCount` - Number of duplicates suppressed
## Grouping Types
| Type | Description |
|------|-------------|
| **Time Window** | Groups alerts within N minutes of each other |
| **Field-Based** | Groups by matching fields (monitor, severity, labels) |
| **Smart** | ML-based similarity matching (future) |
## Flapping Prevention
- **resolveDelayMinutes** - Grace period before auto-resolving (prevents rapid state changes)
- **reopenWindowMinutes** - Window after resolution where episode can be reopened instead of creating new
## On-Call Policy Resolution
Priority chain for notifications:
1. Grouping rule's is linked to on-call policy. When episode is created via a grouping rule, that rule's on-call policy is used.
2. If alert has any on-call policy. Please use it as well along with grouping rule's on-call policy.
3. If neither the grouping rule nor alert has an on-call policy, no notifications are sent.
When an alert joins an episode, the alert policy (if any) is executed as normal. The episode's on-call policy is also executed. This means that if an alert has an on-call policy, notifications may be sent twice - once for the alert and once for the episode. If the episode policy is executed and then a new alert joins the episode, the episode's on-call policy is NOT re-executed.
### Worker Jobs
- **EpisodeAutoResolve** - Resolves episodes when all alerts resolved
- **EpisodeBreakInactive** - Resolves episodes after inactivity period
## Database Migrations
Please do not write Database migrations. I will do that manually.