Skip to main content

Alerts & Notifications

Configurable alert rules with metric thresholds, event counts, anomaly detection, and status change triggers. Multi-channel notifications via email, Slack, webhooks, and SMS.

Overview

EdgeFlow's alert system monitors device metrics and events, triggering notifications when configurable thresholds are breached. Alert rules support multiple trigger types and can send notifications through email, Slack, webhooks, and SMS channels.

Alert Rule Types

Type Description Example
metric_threshold Trigger when a metric exceeds a value CPU > 90% for 5 minutes
event_count Trigger when event frequency exceeds a count More than 10 errors in 1 hour
anomaly ML-based anomaly detection on metrics Unusual memory usage pattern
status_change Trigger on device or flow state change Device goes offline

Severity Levels

Severity Description
Critical Immediate attention required, service impact
High Significant issue, may cause service degradation
Medium Notable condition, should be investigated
Low Minor issue, informational
Info Informational only, no action needed

Alert States

┌─────────┐   threshold   ┌─────────┐   acknowledge   ┌──────────────┐
│  (none) │──────────────>│ Firing  │────────────────>│ Acknowledged │
└─────────┘    breached   └────┬────┘                 └──────┬───────┘
                               │                             │
                          condition                     auto-resolve
                          clears                             │
                               │                             │
                               ▼                             ▼
                          ┌──────────┐                ┌──────────┐
                          │ Resolved │                │ Resolved │
                          └──────────┘                └──────────┘

Notification Channels

Each alert rule can send notifications to multiple channels simultaneously:

Channel Description
Email SMTP-based email notifications with templates
Slack Slack channel messages via webhook
Webhooks HTTP POST to custom endpoints
SMS Text messages via Twilio

Alert Rule Configuration

# Create an alert rule
POST /api/v1/alerts
{
  "name": "High CPU Alert",
  "description": "Alert when CPU exceeds 90% for 5 minutes",
  "type": "metric_threshold",
  "severity": "high",
  "enabled": true,
  "metric": "cpu_usage_percent",
  "condition": "greater_than",
  "threshold": 90,
  "evaluation_interval": 60,
  "evaluation_window": 300,
  "threshold_count": 5,
  "device_id": "dev_abc123",
  "notification_channels": [
    {"type": "email", "config": {"to": "ops@acme.com"}},
    {"type": "slack", "config": {"webhook_url": "https://hooks.slack.com/..."}}
  ],
  "labels": {"team": "ops", "environment": "production"},
  "annotations": {"runbook": "https://wiki.acme.com/high-cpu"}
}

Alert Rule Parameters

Parameter Description
evaluation_interval How often the rule is checked (seconds)
evaluation_window Time window for metric aggregation (seconds)
threshold_count Number of breaches before alert fires
labels Key-value metadata for filtering and grouping
annotations Additional context (runbook URLs, descriptions)

Alert Management

# List all alerts
GET /api/v1/alerts

# Get alert details
GET /api/v1/alerts/:id

# Update alert rule
PUT /api/v1/alerts/:id

# Delete alert rule
DELETE /api/v1/alerts/:id

# Acknowledge a firing alert
POST /api/v1/alerts/:id/acknowledge

Throttling

Alert notifications include built-in throttling to prevent notification storms. Once an alert fires, duplicate notifications for the same rule are suppressed for a configurable period before re-alerting.