Files
oneuptime/Internal/Roadmap/Dashboards.md

725 lines
38 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Plan: Bring OneUptime Dashboards to Industry Parity and Beyond
## Context
OneUptime's dashboard implementation provides a 12-column grid layout with drag-and-drop editing, 3 widget types (Chart with Line/Bar, Value, Text with basic formatting), global time range with presets, view/edit modes, role-based permissions, and full-screen support. Dashboard config is stored as a single JSON column. Dashboards can only query OpenTelemetry metrics from ClickHouse.
This plan identifies the remaining gaps vs Grafana, Datadog, and New Relic, and proposes a phased implementation to build a best-in-class dashboard product that leverages OneUptime's unique position as an all-in-one observability + status page platform.
## Completed
The following features have been implemented:
- **12-Column Grid Layout** - Fixed grid with dynamic unit sizing, 60 default rows (expandable)
- **Drag-and-Drop Editing** - Move and resize components with bounds checking
- **Chart Widget** - Line and Bar chart types with single metric query, configurable title/description/legend
- **Value Widget** - Single metric aggregation displayed as large number
- **Text Widget** - Bold/Italic/Underline formatting (no markdown)
- **Global Time Range** - Presets (30min to 3mo) + custom date range picker
- **View/Edit Modes** - Read-only view with full-screen, edit mode with side panel settings
- **Role-Based Permissions** - ProjectOwner, ProjectAdmin, ProjectMember + custom permissions
- **Dashboard CRUD API** - Standard REST API with slug generation
- **Billing Enforcement** - Free plan limited to 1 dashboard
- **Area Chart** (Phase 1.1) - Area and Stacked Area chart types added to ChartType enum and rendered via chart component
- **Table Widget** (Phase 1.1) - New DashboardComponentType.Table with timestamp/value columns, sticky header, configurable max rows
- **Gauge Widget** (Phase 1.1) - SVG semi-circle gauge with threshold-based color coding (green/yellow/red), configurable min/max/thresholds
- **Template Variables** (Phase 1.2) - DashboardVariable type with CustomList, Query, and TextInput types; toolbar dropdown/input selectors; variable changes trigger widget refresh
- **Auto-Refresh** (Phase 1.3) - 7 interval options (5s to 15m), timer pauses in edit mode, pulsing indicator, interval persisted in dashboard config
- **Multiple Queries per Chart** (Phase 1.4) - metricQueryConfigs array support with fallback to single metricQueryConfig; each query rendered as a separate series
- **Markdown Support** (Phase 1.5) - isMarkdown flag on text components; renders LazyMarkdownViewer when enabled, falls back to bold/italic/underline when disabled
- **Threshold / Color Coding** (Phase 1.6) - Warning and critical threshold config on Value and Chart components; Value widget changes background/text color (green → yellow → red) based on thresholds
- **Legend Interaction** (Phase 1.7) - onValueChange enabled on Line, Area, and Bar charts for built-in Tremor legend click-to-toggle filtering
- **Chart Zoom** (Phase 1.8) - Time range zoom stack with Reset Zoom button in toolbar; pushing current range before zoom, popping on reset
## Gap Analysis Summary
| Feature | OneUptime | Grafana | Datadog | New Relic | Priority |
|---------|-----------|---------|---------|-----------|----------|
| Widget types | 5 (Chart, Value, Text, Table, Gauge) | 20+ | 40+ | 15+ | ~~**P0**~~ Done |
| Chart types | 4 (Line, Bar, Area, Stacked Area) | 10+ | 12+ | 10+ | ~~**P0**~~ Done |
| Template variables | 3 types (CustomList, Query, TextInput) | 6+ types | Yes | 3 types | ~~**P0**~~ Done |
| Auto-refresh | 7 intervals (5s15m) | Configurable | Real-time | Yes | ~~**P0**~~ Done |
| Log panels | None | Yes (Loki) | Yes | Yes (NRQL) | **P0** |
| Trace panels | None | Yes (Tempo) | Yes | Yes | **P0** |
| Table widget | Yes | Yes | Yes | Yes | ~~**P0**~~ Done |
| Multiple queries per chart | Yes (array) | Yes | Yes | Yes | ~~**P0**~~ Done |
| Markdown support | Yes (toggle) | Full markdown | Full markdown | Full markdown | ~~**P0**~~ Done |
| Threshold lines / color coding | Value widget color coding | Yes | Yes | Yes | ~~**P0**~~ Partial |
| Legend interaction (show/hide) | Yes (click toggle) | Yes | Yes | Yes | ~~**P0**~~ Done |
| Chart zoom | Yes (time range stack) | Yes | Yes | Yes | ~~**P0**~~ Done |
| Unified query plugin interface | None | Datasource plugins | Yes | NRQL | **P0** |
| Dashboard linking / drill-down | None | Data links | Yes | Facet linking | **P1** |
| Annotations / event overlays | None | Yes | Yes | Yes (Labs) | **P1** |
| Row/section grouping | None | Collapsible rows | Groups | No | **P1** |
| Public/shared dashboards | None | Yes | Yes | Yes | **P1** |
| JSON import/export | None | Yes | Yes | Yes | **P1** |
| Dashboard versioning | None | Yes | Yes | No | **P1** |
| Alert integration | None | Create from panel + show state | Yes | NRQL alerts | **P1** |
| TV/Kiosk mode | Full-screen only | Kiosk mode | Yes | Auto-cycling | **P1** |
| CSV export | None | Yes | Yes | Yes | **P1** |
| Custom time per widget | None | No | No | No | **P1** |
| Perses/Grafana import | None | N/A | No | No | **P1** |
| AI dashboard creation | None | None | None | None | **P2** |
| Dashboard-as-code SDK | None | Foundation SDK | No | No | **P2** |
| Terraform provider | None | Yes | Yes | Yes | **P2** |
---
## Architecture: Query Plugin Interface & Perses Compatibility
Before implementing features, we should establish a `QueryPlugin` interface that all widget data sources use. This is a foundational architectural change that enables Phase 2 (logs, traces) and Phase 4 (Dashboard-as-Code) cleanly.
### Why Not Adopt Perses Wholesale?
[Perses](https://perses.dev) is a CNCF Sandbox project providing an open dashboard specification and embeddable UI components. We evaluated it as a potential protocol for our dashboard system. The decision is to **selectively borrow patterns** rather than adopt it fully:
**Against full adoption:**
- Our `AggregateBy` API queries ClickHouse directly. Perses assumes Prometheus/PromQL as the primary query language — mapping our ClickHouse aggregation queries into Perses's `PrometheusTimeSeriesQuery` plugin model adds unnecessary indirection.
- Phase 2 (click-to-correlate, cross-signal correlation) is our biggest differentiator. Perses has basic Tempo/Loki plugins but nothing like unified correlation. Adopting their panel model would constrain our ability to build these features.
- Perses is still CNCF Sandbox stage with data model structs marked deprecated in favor of a new `perses/spec` repo. The spec is not yet stable enough to build a product on.
- Maintaining a translation layer between Perses spec and our internal `DashboardViewConfig` format for every feature would add ongoing overhead.
**What we selectively adopt:**
| Perses Concept | Where It Helps | How |
|---|---|---|
| `kind`+`spec` plugin pattern | Phase 1.9 (QueryPlugin), Phase 2.1-2.2 | Formalize widget data sources as plugins instead of hardcoding every widget type |
| Variable model with scoping | Phase 1.2 (Template Variables) | Adopt query-based, list, and text variable types with dashboard → project → global scoping |
| Decoupled layout from panels | Phase 3.4 (Sections) | Separate panel definitions from grid positions to make sections/grouping cleaner |
| Dashboard JSON schema | Phase 3.2 (Import/Export) | Support importing Perses-format dashboards alongside native format for Grafana migration path |
### 1.9 Unified Query Plugin Interface
**Current**: Widgets are hardcoded to query metrics via `MetricQueryConfigData` and the `AggregateBy` API. Adding logs or traces as data sources requires duplicating the entire query path.
**Target**: A `QueryPlugin` interface that abstracts data sources, enabling any widget to query metrics, logs, or traces through a unified contract.
**Design**:
```typescript
// The plugin pattern borrowed from Perses: kind + spec
interface QueryPlugin {
kind: "MetricQuery" | "LogQuery" | "TraceQuery" | "FormulaQuery";
spec: MetricQuerySpec | LogQuerySpec | TraceQuerySpec | FormulaQuerySpec;
}
interface MetricQuerySpec {
metricName: string;
attributes: JSONObject;
aggregationType: AggregationType;
groupBy?: string[];
}
interface LogQuerySpec {
severityFilter?: SeverityLevel[];
serviceFilter?: string[];
bodyContains?: string;
attributes?: JSONObject;
}
interface TraceQuerySpec {
serviceFilter?: string[];
operationFilter?: string[];
statusFilter?: TraceStatus[];
minDuration?: Duration;
}
interface FormulaQuerySpec {
formula: string; // e.g., "a / b * 100"
queries: Record<string, QueryPlugin>; // named sub-queries
}
// Each widget stores an array of QueryPlugins instead of MetricQueryConfigData
interface DashboardWidgetConfig {
queries: QueryPlugin[];
// ... other widget config
}
```
**Benefits**:
- Log stream and trace list widgets (Phase 2.1, 2.2) plug in without new query plumbing
- Cross-signal correlation (Phase 2.3) becomes a multi-query widget with mixed `kind` values
- Formula queries (Phase 1.4) compose naturally across query types
- Future data sources (e.g., external Prometheus, custom APIs) add a new `kind` without touching widget code
- Aligns with Perses's extensibility model without coupling to their spec
**Files to modify**:
- `Common/Types/Dashboard/QueryPlugin.ts` (new - interface definitions)
- `Common/Types/Metrics/MetricsQuery.ts` (refactor to implement MetricQuerySpec)
- `Common/UI/Utils/AnalyticsModelAPI/AnalyticsModelAPI.ts` (add query resolver that dispatches by `kind`)
- `Common/Server/API/BaseAnalyticsAPI.ts` (add unified query endpoint)
- `App/FeatureSet/Dashboard/src/Components/Metrics/Utils/Metrics.ts` (refactor fetchResults to use QueryPlugin)
---
## Phase 1: Foundation (P0) — Close Critical Gaps
These gaps make OneUptime dashboards fundamentally non-competitive. Every major competitor has these.
### 1.1 Add Core Chart Types: Area, Pie, Table, Gauge, Heatmap, Histogram ✅ (Partial)
**Status**: Area, Stacked Area, Table, and Gauge are implemented. Pie, Heatmap, and Histogram enum values are defined but rendering is not yet implemented.
**Current**: Line and Bar only.
**Target**: 8+ chart types covering all standard observability visualization needs.
**Implementation**:
- **Area Chart** (stacked and non-stacked): Extension of line chart with fill. Use existing chart library's area mode
- **Pie/Donut Chart**: For proportional breakdowns (e.g., error distribution by service). New component
- **Table Widget**: Tabular metric data, top-N lists, multi-column display with sortable columns. New component type
- **Gauge Widget**: Circular gauge with configurable min/max/thresholds and color zones. New component
- **Heatmap**: Time on X-axis, value buckets on Y-axis, color intensity for count. Essential for distribution/histogram metrics
- **Histogram**: Bar chart showing value distribution. Important for latency analysis
Each chart type needs:
- A new entry in `DashboardComponentType` or `ChartType` enum
- A rendering component in `Dashboard/Components/`
- Configuration options in the component settings side panel
**Files to modify**:
- `Common/Types/Dashboard/Chart/ChartType.ts` (add Area, Pie, Heatmap, Histogram, Gauge)
- `Common/Types/Dashboard/DashboardComponentType.ts` (add Table, Gauge)
- `App/FeatureSet/Dashboard/src/Components/Dashboard/Components/DashboardChartComponent.tsx` (render new types)
- `App/FeatureSet/Dashboard/src/Components/Dashboard/Components/DashboardTableComponent.tsx` (new)
- `App/FeatureSet/Dashboard/src/Components/Dashboard/Components/DashboardGaugeComponent.tsx` (new)
### 1.2 Template Variables ✅
**Status**: Implemented. DashboardVariable type with CustomList, Query, and TextInput types. Toolbar renders dropdown selectors and text inputs. Variable changes trigger widget refresh via refreshTick.
**Current**: No template variables. Users must create separate dashboards for each service/host/environment.
**Target**: Drop-down variable selectors that dynamically filter all widgets.
**Implementation**:
- Create a `DashboardVariable` type stored in `dashboardViewConfig`:
- Name, label, type (query-based, custom list, text input)
- Query-based: runs a ClickHouse query to populate options (e.g., `SELECT DISTINCT service FROM MetricItem WHERE projectId = {pid}`)
- Custom list: manually defined options
- Multi-value selection support
- Render variables as dropdown selectors in the dashboard toolbar
- Variables can be referenced in metric queries as `$variable_name`
- When a variable changes, all widgets re-query with the new value
- Support cascading variables (variable B's query depends on variable A's value)
- **Scoping model (from Perses)**: Variables can be defined at dashboard, project, or global scope. Dashboard-level overrides project-level, which overrides global. This lets teams define org-wide variables (e.g., `$environment`) once and reuse across dashboards.
**Files to modify**:
- `Common/Types/Dashboard/DashboardVariable.ts` (new)
- `Common/Types/Dashboard/DashboardViewConfig.ts` (add variables array)
- `App/FeatureSet/Dashboard/src/Components/Dashboard/Toolbar/DashboardToolbar.tsx` (render variable dropdowns)
- `App/FeatureSet/Dashboard/src/Components/Dashboard/DashboardView.tsx` (pass variable values to widgets)
- `Common/Server/Services/MetricService.ts` (resolve variable references in queries)
### 1.3 Auto-Refresh ✅
**Status**: Implemented. AutoRefreshInterval enum with 7 options (OFF, 5s, 10s, 30s, 1m, 5m, 15m). Timer management via setInterval/clearInterval, pauses in edit mode, pulsing blue dot indicator, interval persisted in DashboardViewConfig.
**Current**: Data goes stale after initial load.
**Target**: Configurable auto-refresh intervals.
**Implementation**:
- Add auto-refresh dropdown in toolbar with options: Off, 5s, 10s, 30s, 1m, 5m, 15m
- Store selected interval in dashboard config and URL state
- Use `setInterval` to trigger re-fetch on all metric widgets
- Show a subtle refresh indicator when data is being updated
- Pause auto-refresh when the dashboard is in edit mode
**Files to modify**:
- `App/FeatureSet/Dashboard/src/Components/Dashboard/Toolbar/DashboardToolbar.tsx` (add refresh dropdown)
- `App/FeatureSet/Dashboard/src/Components/Dashboard/DashboardView.tsx` (implement refresh timer)
- `Common/Types/Dashboard/DashboardViewConfig.ts` (store refresh interval)
### 1.4 Multiple Queries per Chart with Formulas ✅ (Partial)
**Status**: Multiple queries implemented via metricQueryConfigs array with fallback to single metricQueryConfig. Each query renders as a separate series. Formula evaluation and dual Y-axis are not yet implemented.
**Current**: Single `MetricQueryConfigData` per chart.
**Target**: Overlay multiple metric series on a single chart for correlation, with cross-query formulas.
**Implementation**:
- Change chart component's data source from single `MetricQueryConfigData` to `QueryPlugin[]` (using the new unified interface)
- Each query gets its own alias and legend entry
- Support `FormulaQuery` plugin kind for cross-query formulas (e.g., `a / b * 100` where `a` and `b` reference other queries by alias)
- Y-axis: support dual Y-axes for metrics with different scales
- Formula evaluation happens server-side to avoid shipping raw data to the client
**Files to modify**:
- `Common/Utils/Dashboard/Components/DashboardChartComponent.ts` (change to QueryPlugin array)
- `App/FeatureSet/Dashboard/src/Components/Dashboard/Components/DashboardChartComponent.tsx` (render multiple series)
- `App/FeatureSet/Dashboard/src/Components/Dashboard/Canvas/ComponentSettingsSideOver.tsx` (multi-query config UI)
- `Common/Server/Services/FormulaEvaluator.ts` (new - server-side formula evaluation)
### 1.5 Full Markdown Support for Text Widget ✅
**Status**: Implemented. isMarkdown boolean flag added to DashboardTextComponent. When enabled, renders via LazyMarkdownViewer. When disabled, falls back to existing bold/italic/underline formatting.
**Current**: Only bold, italic, underline formatting.
**Target**: Full markdown rendering including headers, links, lists, code blocks, tables, and images.
**Implementation**:
- Replace the current custom formatting with a markdown renderer (e.g., `react-markdown` or `marked`)
- Support: headers (h1-h6), links, ordered/unordered lists, code blocks with syntax highlighting, tables, images, blockquotes
- Edit mode: raw markdown text area with preview toggle
**Files to modify**:
- `App/FeatureSet/Dashboard/src/Components/Dashboard/Components/DashboardTextComponent.tsx` (replace with markdown renderer)
- `Common/Utils/Dashboard/Components/DashboardTextComponent.ts` (store raw markdown)
### 1.6 Threshold Lines & Color Coding ✅ (Partial)
**Status**: Warning/critical threshold config added to Chart and Value components. Value widget implements color coding (green → yellow → red background/text). Chart threshold reference lines are configured in the data model but visual rendering on charts requires Tremor chart library modifications (deferred).
**Current**: No threshold visualization.
**Target**: Configurable warning/critical thresholds on charts with color-coded regions.
**Implementation**:
- Add threshold configuration to chart settings: value, label, color (default: yellow for warning, red for critical)
- Render horizontal lines on the chart at threshold values
- Optionally fill regions above/below thresholds with translucent color
- For value/billboard widgets: change background color based on which threshold range the value falls in (green/yellow/red)
**Files to modify**:
- `Common/Utils/Dashboard/Components/DashboardChartComponent.ts` (add thresholds config)
- `App/FeatureSet/Dashboard/src/Components/Dashboard/Components/DashboardChartComponent.tsx` (render threshold lines)
- `App/FeatureSet/Dashboard/src/Components/Dashboard/Components/DashboardValueComponent.tsx` (color coding)
### 1.7 Legend Interaction (Show/Hide Series) ✅
**Status**: Implemented. onValueChange callback enabled on Line, Area, and Bar chart components, activating Tremor's built-in activeLegend state management for click-to-toggle series visibility.
**Current**: Legends are display-only.
**Target**: Click legend items to toggle series visibility.
**Implementation**:
- Add click handler on legend items to toggle series visibility
- Clicked-off series should be visually dimmed in the legend and removed from the chart
- Support "isolate" mode: Ctrl+Click shows only that series and hides all others
- Persist toggled state during the session (reset on page reload)
**Files to modify**:
- `App/FeatureSet/Dashboard/src/Components/Metrics/MetricGraph.tsx` (add legend click handlers)
### 1.8 Chart Zoom (Click-Drag Time Selection) ✅
**Status**: Implemented via time range stack. Toolbar shows Reset Zoom button when zoomed in. Current range is pushed to stack before zoom, popped on reset. In-chart brush/drag selection not yet implemented (uses toolbar-driven zoom instead).
**Current**: No zoom capability.
**Target**: Click and drag on a time series chart to zoom into a time range.
**Implementation**:
- Enable brush/selection mode on time series charts
- When user drags to select a range, update the global time range to the selected range
- Show a "Reset zoom" button to return to the previous time range
- Maintain a zoom stack so users can zoom in multiple times and zoom back out
**Files to modify**:
- `App/FeatureSet/Dashboard/src/Components/Metrics/MetricGraph.tsx` (add brush selection)
- `App/FeatureSet/Dashboard/src/Components/Dashboard/DashboardView.tsx` (handle time range updates from zoom)
---
## Phase 2: Observability Integration (P0-P1) — Leverage the Full Platform
This is where OneUptime can differentiate: metrics, logs, and traces in one platform. The `QueryPlugin` interface from Phase 1.9 makes this phase significantly easier — each new signal type is a new `kind` in the plugin system rather than a new query pipeline.
### 2.1 Log Stream Widget
**Current**: Dashboards can only show metrics.
**Target**: Widget that displays a live log stream with filtering.
**Implementation**:
- New `DashboardComponentType.LogStream` widget type
- Uses `QueryPlugin` with `kind: "LogQuery"` — same interface as metric widgets
- Configuration: log query filter, severity filter, service filter, max rows
- Renders as a scrolling log list with severity color coding, timestamp, and body
- Click a log entry to expand and see full details
- Respects dashboard time range and template variables
**Files to modify**:
- `Common/Types/Dashboard/DashboardComponentType.ts` (add LogStream)
- `Common/Utils/Dashboard/Components/DashboardLogStreamComponent.ts` (new - config)
- `App/FeatureSet/Dashboard/src/Components/Dashboard/Components/DashboardLogStreamComponent.tsx` (new - rendering)
- `Common/Server/Services/LogQueryResolver.ts` (new - implements QueryPlugin for logs)
### 2.2 Trace List Widget
**Current**: No trace visualization in dashboards.
**Target**: Widget showing a filtered trace list with duration and status.
**Implementation**:
- New `DashboardComponentType.TraceList` widget type
- Uses `QueryPlugin` with `kind: "TraceQuery"` — same interface as metric and log widgets
- Configuration: service filter, operation filter, status filter, min duration
- Renders as a table: trace ID, operation, service, duration, status, timestamp
- Click a row to navigate to the full trace view
- Respects dashboard time range and template variables
**Files to modify**:
- `Common/Types/Dashboard/DashboardComponentType.ts` (add TraceList)
- `Common/Utils/Dashboard/Components/DashboardTraceListComponent.ts` (new)
- `App/FeatureSet/Dashboard/src/Components/Dashboard/Components/DashboardTraceListComponent.tsx` (new)
- `Common/Server/Services/TraceQueryResolver.ts` (new - implements QueryPlugin for traces)
### 2.3 Click-to-Correlate Across Signals
**Current**: No cross-signal correlation in dashboards.
**Target**: Click a point on a metric chart to instantly see related logs and traces from that timestamp.
**Implementation**:
- When clicking a data point on a metric chart, open a correlation panel showing:
- Logs from the same service and time window (+/- 5 minutes around the clicked point)
- Traces from the same service and time window
- Filtered by the same template variables
- The correlation panel uses the `QueryPlugin` interface internally — it fires a `LogQuery` and `TraceQuery` scoped to the clicked timestamp and service context
- The correlation panel appears as a slide-over or split view below the chart
- This is a major differentiator vs Grafana (which requires separate datasources) and ties into OneUptime's all-in-one advantage
- No competitor, including Perses, has this level of built-in cross-signal correlation
**Files to modify**:
- `App/FeatureSet/Dashboard/src/Components/Dashboard/Components/DashboardChartComponent.tsx` (add click handler)
- `App/FeatureSet/Dashboard/src/Components/Dashboard/CorrelationPanel.tsx` (new - shows correlated logs/traces)
### 2.4 Annotations / Event Overlays
**Current**: No event markers on charts.
**Target**: Show deployment events, incidents, and alerts as vertical markers on time series charts.
**Implementation**:
- Query OneUptime's own data for events in the chart's time range:
- Incidents (from Incident model)
- Deployments (can be sent as OTLP resource attributes or a custom event API)
- Alert triggers (from monitor alert history)
- Render as vertical dashed lines with icons on hover
- Color-code by type: red for incidents, blue for deployments, yellow for alerts
- Allow users to add manual annotations (text + timestamp)
**Files to modify**:
- `Common/Types/Dashboard/DashboardAnnotation.ts` (new)
- `App/FeatureSet/Dashboard/src/Components/Metrics/MetricGraph.tsx` (render annotation markers)
- `Common/Server/API/DashboardAnnotationAPI.ts` (new - query events)
### 2.5 Alert Integration
**Current**: No connection between dashboards and alerting.
**Target**: Create alerts from dashboard panels and display alert state on panels.
**Implementation**:
- "Create Alert" button in chart settings that pre-fills a metric monitor with the chart's query
- Show alert state indicator on chart headers (green/yellow/red dot) based on associated monitor status
- Alert status widget: shows a summary of all active alerts with severity and duration
**Files to modify**:
- `App/FeatureSet/Dashboard/src/Components/Dashboard/Canvas/ComponentSettingsSideOver.tsx` (add "Create Alert" button)
- `App/FeatureSet/Dashboard/src/Components/Dashboard/Components/DashboardChartComponent.tsx` (show alert state)
- `Common/Types/Dashboard/DashboardComponentType.ts` (add AlertStatus widget type)
### 2.6 SLO/SLI Widget
**Current**: No SLO visualization.
**Target**: Dedicated widget showing SLO status, error budget burn rate, and remaining budget.
**Implementation** (depends on Metrics roadmap Phase 3.2 - SLO/SLI Tracking):
- New `DashboardComponentType.SLO` widget type
- Configuration: select an SLO definition
- Displays: current attainment (%), target (%), error budget remaining (%), burn rate chart
- Color-coded: green (healthy), yellow (burning fast), red (budget exhausted)
**Files to modify**:
- `Common/Types/Dashboard/DashboardComponentType.ts` (add SLO)
- `App/FeatureSet/Dashboard/src/Components/Dashboard/Components/DashboardSLOComponent.tsx` (new)
---
## Phase 3: Collaboration & Sharing (P1) — Production Workflows
### 3.1 Public/Shared Dashboards
**Current**: Dashboards require login.
**Target**: Share dashboards with external stakeholders without requiring authentication.
**Implementation**:
- Add `isPublic` flag and `publicAccessToken` to Dashboard model
- Generate a shareable URL with token: `/public/dashboard/{token}`
- Public view is read-only with no editing controls
- Option to restrict public access to specific IP ranges
- Render without the OneUptime navigation chrome
**Files to modify**:
- `Common/Models/DatabaseModels/Dashboard.ts` (add isPublic, publicAccessToken)
- `App/FeatureSet/Dashboard/src/Pages/Public/Dashboard.tsx` (new - public dashboard view)
### 3.2 JSON Import/Export with Perses & Grafana Compatibility
**Current**: No import/export capability.
**Target**: Export dashboards as JSON and re-import for backup, migration, and dashboard-as-code. Support importing Perses and Grafana dashboard formats.
**Implementation**:
- **Native export**: Serialize `dashboardViewConfig` + metadata (name, description, variables) as a JSON file download. Include a schema version for forward compatibility.
- **Perses-compatible export**: Alongside native format, output a Perses-spec-compatible JSON. This gives users interoperability with the CNCF ecosystem without coupling our internals. Map our `QueryPlugin` kinds to Perses panel plugin types where possible.
- **Grafana import**: Perses already has tooling to convert Grafana dashboards to Perses format. By supporting Perses import, we get Grafana migration for free: Grafana → Perses → OneUptime.
- **Import pipeline**: Upload a JSON file → detect format (native, Perses, Grafana) → translate to `DashboardViewConfig` → validate → create dashboard.
- Handle version compatibility (include a schema version in the export)
**Files to modify**:
- `App/FeatureSet/Dashboard/src/Pages/Dashboards/Dashboards.tsx` (add import button)
- `App/FeatureSet/Dashboard/src/Pages/Dashboards/View/Settings.tsx` (add export button)
- `Common/Server/API/DashboardImportExportAPI.ts` (new)
- `Common/Server/Utils/Dashboard/PersesConverter.ts` (new - bidirectional Perses format conversion)
- `Common/Server/Utils/Dashboard/GrafanaConverter.ts` (new - Grafana JSON to native format)
### 3.3 Dashboard Versioning
**Current**: No change history.
**Target**: Track changes to dashboards over time with the ability to view history and revert.
**Implementation**:
- Create `DashboardVersion` model: dashboardId, version number, config snapshot, changedBy, timestamp
- On each save, create a new version entry
- UI: "Version History" in settings showing a list of versions with timestamps and authors
- "Restore" button to revert to a previous version
- Optional: diff view comparing two versions
**Files to modify**:
- `Common/Models/DatabaseModels/DashboardVersion.ts` (new)
- `Common/Server/Services/DashboardService.ts` (create version on save)
- `App/FeatureSet/Dashboard/src/Pages/Dashboards/View/VersionHistory.tsx` (new)
### 3.4 Row/Section Grouping with Decoupled Layout
**Current**: Components placed freely with no grouping. Panel definitions and grid positions are mixed together in each component.
**Target**: Collapsible rows/sections for organizing related panels, with layout decoupled from panel definitions.
**Implementation**:
- **Decouple layout from panels** (pattern from Perses): Separate panel definitions (what to render) from layout definitions (where to render it). Panels are stored in a `panels` map keyed by ID. Layouts reference panels by `$ref`. This makes it easier to rearrange panels without modifying their query/display config.
- Add a "Section" component type that acts as a collapsible container
- Section has a title bar that can be clicked to collapse/expand
- When collapsed, hides all components within the section's vertical range
- Sections can be nested one level deep
- Migration: existing `DashboardViewConfig` components are automatically split into panel + layout entries on first load
**Files to modify**:
- `Common/Types/Dashboard/DashboardViewConfig.ts` (add panels map + layouts array, deprecate inline component positions)
- `Common/Types/Dashboard/DashboardComponentType.ts` (add Section)
- `App/FeatureSet/Dashboard/src/Components/Dashboard/Components/DashboardSectionComponent.tsx` (new)
- `App/FeatureSet/Dashboard/src/Components/Dashboard/Canvas/Index.tsx` (handle section collapse, resolve panel refs)
### 3.5 TV/Kiosk Mode
**Current**: Full-screen only.
**Target**: Dedicated kiosk mode optimized for wall-mounted monitors with auto-cycling.
**Implementation**:
- Kiosk mode: hides all chrome (toolbar, navigation, URL bar), shows only the dashboard grid
- Auto-cycle: rotate through a list of dashboards at a configurable interval (30s, 1m, 5m)
- Dashboard playlist: define an ordered list of dashboards to cycle through
- Support per-dashboard display duration
**Files to modify**:
- `App/FeatureSet/Dashboard/src/Pages/Dashboards/Kiosk.tsx` (new - kiosk view)
- `Common/Models/DatabaseModels/DashboardPlaylist.ts` (new - playlist model)
### 3.6 CSV Export
**Current**: No data export.
**Target**: Export chart/table data as CSV for offline analysis.
**Implementation**:
- Add "Export CSV" option in chart/table context menu
- Client-side: serialize the current rendered data to CSV format
- Include column headers, timestamps, and values
- Trigger browser download
**Files to modify**:
- `App/FeatureSet/Dashboard/src/Components/Dashboard/Components/DashboardChartComponent.tsx` (add export option)
- `Common/Utils/Dashboard/CSVExport.ts` (new - CSV serialization)
### 3.7 Custom Time Range per Widget
**Current**: All widgets share the global time range.
**Target**: Individual widgets can override the global time range.
**Implementation**:
- Add optional `timeRangeOverride` to each component's config
- When set, the widget uses its own time range instead of the global one
- Show a small clock icon on widgets with custom time ranges
- Configuration in the component settings side panel
**Files to modify**:
- `Common/Utils/Dashboard/Components/DashboardBaseComponent.ts` (add timeRangeOverride)
- `App/FeatureSet/Dashboard/src/Components/Dashboard/DashboardView.tsx` (pass per-widget time ranges)
---
## Phase 4: Differentiation (P2-P3) — Surpass Competition
### 4.1 AI-Powered Dashboard Creation
**Current**: Manual dashboard creation only.
**Target**: Natural language dashboard creation - "Show me CPU usage by service for the last 24 hours" auto-creates the right widget.
**Implementation**:
- Natural language input in the "Add Widget" dialog
- AI translates to: metric name, aggregation, group by, chart type, time range
- Uses available MetricType metadata to match metric names
- Preview the generated widget before adding to dashboard
- This is a feature NO competitor has done well yet - major differentiator
### 4.2 Pre-Built Dashboard Templates
**Current**: No templates.
**Target**: One-click dashboard templates for common stacks.
**Implementation**:
- Template library: Node.js, Python, Go, Java, Kubernetes, PostgreSQL, Redis, Nginx, MongoDB, etc.
- Auto-detect relevant templates based on ingested telemetry data
- "One-click create" instantiates a full dashboard from the template
- Community template sharing (future)
### 4.3 Auto-Generated Dashboards
**Current**: Users must manually build dashboards.
**Target**: When a service connects, auto-generate a relevant dashboard.
**Implementation**:
- On first telemetry ingest from a new service, analyze the metric names and types
- Auto-create a service dashboard with relevant charts based on detected metrics
- Include golden signals (latency, traffic, errors, saturation) where applicable
- Notify the user and link to the auto-generated dashboard
### 4.4 Customer-Facing Dashboards on Status Pages
**Current**: Status pages and dashboards are separate.
**Target**: Embed dashboard widgets on status pages for real-time performance visibility.
**Implementation**:
- Allow selecting specific dashboard widgets to embed on a status page
- Render widgets in read-only mode without internal navigation
- Respect public/private data boundaries (only show metrics the customer should see)
- This is unique to OneUptime - no competitor has integrated observability dashboards with status pages
### 4.5 Dashboard-as-Code SDK (Perses-Compatible)
**Current**: No programmatic dashboard creation.
**Target**: TypeScript SDK for defining dashboards as code, with optional Perses-compatible output.
**Implementation**:
```typescript
const dashboard = new Dashboard("Service Health")
.addVariable("service", { type: "query", query: "SELECT DISTINCT service FROM MetricItem" })
.addRow("Latency")
.addChart({ metric: "http.server.duration", aggregation: "p99", groupBy: ["$service"] })
.addChart({ metric: "http.server.duration", aggregation: "p50", groupBy: ["$service"] })
.addRow("Throughput")
.addChart({ metric: "http.server.request.count", aggregation: "rate", groupBy: ["$service"] })
// Output native OneUptime format
dashboard.toJSON();
// Output Perses-compatible format for ecosystem interop
dashboard.toPerses();
```
- SDK generates the JSON config and uses the Dashboard API to create/update
- Git-based provisioning: store dashboard definitions in repo, CI/CD syncs to OneUptime
- `toPerses()` output allows users to share dashboard definitions with teams using Perses or other CNCF-compatible tools
- Perses's CUE SDK patterns can inform our builder API design
### 4.6 Anomaly Detection Overlays
**Current**: No anomaly visualization.
**Target**: AI highlights anomalous data points on charts without manual threshold configuration.
**Implementation** (depends on Metrics roadmap Phase 3.1 - Anomaly Detection):
- Automatically overlay expected range bands (baseline +/- N sigma) on metric charts
- Highlight data points outside the expected range with color indicators
- Click an anomaly to see correlated changes across metrics, logs, and traces
### 4.7 Terraform / OpenTofu Provider
**Current**: No infrastructure-as-code support for dashboards.
**Target**: Manage dashboards via Terraform/OpenTofu for GitOps workflows.
**Implementation**:
- Expose dashboard CRUD via a well-documented REST API (already exists)
- Build a Terraform provider that maps dashboard resources to the API
- Support `oneuptime_dashboard`, `oneuptime_dashboard_variable`, and `oneuptime_dashboard_template` resources
- This complements the Dashboard-as-Code SDK (4.5) — SDK for developers, Terraform for ops teams
---
## Quick Wins (Can Ship This Week) ✅ All Done
1. ~~**Auto-refresh** - Add a simple `setInterval` refresh with dropdown selector in toolbar~~
2. ~~**Full markdown for text widget** - Replace custom formatting with a markdown renderer~~
3. ~~**Legend show/hide** - Add click handler on legend items to toggle series~~
4. ~~**Stacked area chart** - Simple extension of existing line chart with fill~~
5. ~~**Chart zoom** - Enable brush selection on time series charts~~
---
## Recommended Implementation Order
### Phase 0: Architecture Foundation
1. **Phase 1.9** - QueryPlugin interface (enables everything else; do this first)
### Phase 1: Core Features ✅ (Complete — remaining items are partial refinements)
2. ~~**Quick Wins** - Auto-refresh, markdown, legend toggle, stacked area, chart zoom~~
3. ~~**Phase 1.1** - More chart types (Area, Table, Gauge)~~ ✅ (Pie, Heatmap, Histogram rendering deferred)
4. ~~**Phase 1.2** - Template variables with scoping~~
5. ~~**Phase 1.4** - Multiple queries per chart~~ ✅ (Formulas deferred)
6. ~~**Phase 1.6** - Threshold lines & color coding~~ ✅ (Value widget done; chart reference lines deferred)
### Phase 2: Platform Leverage (Differentiators)
7. **Phase 2.1** - Log stream widget (leverages all-in-one platform + QueryPlugin)
8. **Phase 2.2** - Trace list widget (leverages all-in-one platform + QueryPlugin)
9. **Phase 2.3** - Click-to-correlate (major differentiator — no competitor has this built-in)
10. **Phase 2.4** - Annotations / event overlays
11. **Phase 2.5** - Alert integration
### Phase 3: Collaboration
12. **Phase 3.1** - Public/shared dashboards
13. **Phase 3.2** - JSON import/export with Perses & Grafana compatibility
14. **Phase 3.4** - Row/section grouping with decoupled layout
15. **Phase 3.5** - TV/Kiosk mode
16. **Phase 3.3** - Dashboard versioning
17. **Phase 2.6** - SLO widget (depends on SLO/SLI from Metrics roadmap)
### Phase 4: Differentiation
18. **Phase 4.2** - Pre-built dashboard templates
19. **Phase 4.3** - Auto-generated dashboards
20. **Phase 4.1** - AI-powered dashboard creation
21. **Phase 4.4** - Customer-facing dashboards on status pages
22. **Phase 4.5** - Dashboard-as-code SDK (Perses-compatible)
23. **Phase 4.7** - Terraform / OpenTofu provider
24. **Phase 4.6** - Anomaly detection overlays
## Verification
For each feature:
1. Unit tests for new widget types, template variable resolution, CSV export logic, QueryPlugin dispatching
2. Integration tests for new API endpoints (annotations, public dashboards, import/export, Perses/Grafana conversion)
3. Manual verification via the dev server at `https://oneuptimedev.genosyn.com/dashboard/{projectId}/dashboards`
4. Visual regression testing for new chart types (ensure correct rendering across browsers)
5. Performance testing: verify dashboards with 20+ widgets and auto-refresh don't cause excessive API load
6. Test template variables with edge cases: empty results, special characters, multi-value selections
7. Verify public dashboards don't leak private data
8. Test Perses/Grafana import with real-world dashboard exports to validate conversion fidelity
9. Test QueryPlugin interface with mixed query types (metric + log + trace) on a single dashboard