Overview
EdgeFlow's alert system monitors device metrics and events, triggering notifications
when configurable thresholds are breached. Alert rules support multiple trigger types
and can send notifications through email, Slack, webhooks, and SMS channels.
Alert Rule Types
| Type | Description | Example |
metric_threshold | Trigger when a metric exceeds a value | CPU > 90% for 5 minutes |
event_count | Trigger when event frequency exceeds a count | More than 10 errors in 1 hour |
anomaly | ML-based anomaly detection on metrics | Unusual memory usage pattern |
status_change | Trigger on device or flow state change | Device goes offline |
Severity Levels
| Severity | Description |
| Critical | Immediate attention required, service impact |
| High | Significant issue, may cause service degradation |
| Medium | Notable condition, should be investigated |
| Low | Minor issue, informational |
| Info | Informational only, no action needed |
Alert States
┌─────────┐ threshold ┌─────────┐ acknowledge ┌──────────────┐
│ (none) │──────────────>│ Firing │────────────────>│ Acknowledged │
└─────────┘ breached └────┬────┘ └──────┬───────┘
│ │
condition auto-resolve
clears │
│ │
▼ ▼
┌──────────┐ ┌──────────┐
│ Resolved │ │ Resolved │
└──────────┘ └──────────┘
Notification Channels
Each alert rule can send notifications to multiple channels simultaneously:
| Channel | Description |
| Email | SMTP-based email notifications with templates |
| Slack | Slack channel messages via webhook |
| Webhooks | HTTP POST to custom endpoints |
| SMS | Text messages via Twilio |
Alert Rule Configuration
# Create an alert rule
POST /api/v1/alerts
{
"name": "High CPU Alert",
"description": "Alert when CPU exceeds 90% for 5 minutes",
"type": "metric_threshold",
"severity": "high",
"enabled": true,
"metric": "cpu_usage_percent",
"condition": "greater_than",
"threshold": 90,
"evaluation_interval": 60,
"evaluation_window": 300,
"threshold_count": 5,
"device_id": "dev_abc123",
"notification_channels": [
{"type": "email", "config": {"to": "ops@acme.com"}},
{"type": "slack", "config": {"webhook_url": "https://hooks.slack.com/..."}}
],
"labels": {"team": "ops", "environment": "production"},
"annotations": {"runbook": "https://wiki.acme.com/high-cpu"}
}
Alert Rule Parameters
| Parameter | Description |
evaluation_interval | How often the rule is checked (seconds) |
evaluation_window | Time window for metric aggregation (seconds) |
threshold_count | Number of breaches before alert fires |
labels | Key-value metadata for filtering and grouping |
annotations | Additional context (runbook URLs, descriptions) |
Alert Management
# List all alerts
GET /api/v1/alerts
# Get alert details
GET /api/v1/alerts/:id
# Update alert rule
PUT /api/v1/alerts/:id
# Delete alert rule
DELETE /api/v1/alerts/:id
# Acknowledge a firing alert
POST /api/v1/alerts/:id/acknowledge
Throttling
Alert notifications include built-in throttling to prevent notification storms.
Once an alert fires, duplicate notifications for the same rule are suppressed for
a configurable period before re-alerting.