Skip to main content

Telemetry & Monitoring

Real-time system metrics, flow execution history, resource monitoring, Prometheus export, and WebSocket-based live updates for EdgeFlow devices.

Overview

EdgeFlow provides comprehensive telemetry and monitoring for every device in your fleet. Metrics are collected locally, broadcast in real-time via WebSocket, and synced to the cloud for centralized dashboards and alerting.

System Metrics

The resource monitor collects system-level metrics continuously:

Metric Source Description
CPU Usage /proc/stat Per-core and total CPU utilization percentage
Memory /proc/meminfo Total, used, free, available, cached, and swap
Disk statfs Total, used, free disk space per mount
Temperature /sys/class/thermal CPU/GPU temperature (Raspberry Pi)
Uptime /proc/uptime System uptime in seconds
Load Average /proc/loadavg 1, 5, and 15-minute load averages
Network I/O /proc/net/dev Bytes received/transmitted per interface
Goroutines runtime Active Go goroutines count

REST API

# Get resource statistics
curl http://localhost:8080/api/v1/resources/stats

{
  "cpu": {
    "usage_percent": 23.5,
    "cores": 4
  },
  "memory": {
    "total_bytes": 4294967296,
    "used_bytes": 1073741824,
    "available_bytes": 3221225472,
    "percent": 25.0
  },
  "disk": {
    "total_bytes": 32000000000,
    "used_bytes": 8000000000,
    "free_bytes": 24000000000,
    "percent": 25.0
  },
  "temperature": 45.5,
  "uptime": 86400,
  "load_avg": {
    "1min": 0.5,
    "5min": 0.3,
    "15min": 0.2
  }
}

# Get detailed resource report
curl http://localhost:8080/api/v1/resources/report

Flow Metrics

The metrics service tracks flow-level statistics:

Metric Description
total_flows Total number of flows configured
running_flows Currently executing flows
stopped_flows Flows in stopped state
total_executions Cumulative execution count
failed_executions Executions that resulted in errors
success_rate Percentage of successful executions

Execution History

Every flow execution is recorded with node-level event tracking. Up to 100 records are kept in memory with a sliding window.

# Get execution history
curl http://localhost:8080/api/v1/executions

{
  "executions": [
    {
      "id": "exec_xyz789",
      "flow_id": "flow_abc123",
      "flow_name": "Temperature Monitor",
      "status": "completed",
      "start_time": "2026-02-21T12:00:00Z",
      "end_time": "2026-02-21T12:00:01Z",
      "duration": 1234,
      "node_count": 5,
      "completed_nodes": 5,
      "error_nodes": 0,
      "node_events": [
        {
          "node_id": "node_001",
          "node_name": "DHT22 Sensor",
          "node_type": "dht",
          "status": "success",
          "execution_time": 45,
          "timestamp": 1708516800
        },
        {
          "node_id": "node_002",
          "node_name": "MQTT Publish",
          "node_type": "mqtt_out",
          "status": "success",
          "execution_time": 12,
          "timestamp": 1708516801
        }
      ]
    }
  ]
}

Real-time WebSocket Events

Connect to ws://localhost:8080/ws to receive live telemetry updates. The WebSocket hub broadcasts the following event types:

Event Type Trigger Data
flow_status Flow start/stop/create/delete Flow ID, name, status
node_status Node add/remove/update Node ID, type, status
execution Each node execution step Node ID, input/output, timing, status
log Application log events Level, message, source, fields
notification System alerts Title, message, severity
gpio_state GPIO pin change (200ms poll) Pin number, mode, value

WebSocket Message Format

{
  "type": "execution",
  "timestamp": "2026-02-21T12:00:00Z",
  "data": {
    "flow_id": "flow_abc123",
    "node_id": "node_001",
    "node_name": "DHT22 Sensor",
    "node_type": "dht",
    "input": {"payload": "trigger"},
    "output": {"temperature": 22.5, "humidity": 65.0},
    "status": "success",
    "execution_time": 45,
    "timestamp": 1708516800
  }
}

Prometheus Export

EdgeFlow exports metrics in Prometheus text format for integration with Prometheus, Grafana, and other monitoring stacks:

# HELP edgeflow_flows_total Total number of flows
# TYPE edgeflow_flows_total gauge
edgeflow_flows_total 12

# HELP edgeflow_flows_running Currently running flows
# TYPE edgeflow_flows_running gauge
edgeflow_flows_running 5

# HELP edgeflow_executions_total Total executions
# TYPE edgeflow_executions_total counter
edgeflow_executions_total 1542

# HELP edgeflow_executions_failed Failed executions
# TYPE edgeflow_executions_failed counter
edgeflow_executions_failed 23

# HELP edgeflow_nodes_total Total configured nodes
# TYPE edgeflow_nodes_total gauge
edgeflow_nodes_total 87

# HELP edgeflow_uptime_seconds System uptime
# TYPE edgeflow_uptime_seconds gauge
edgeflow_uptime_seconds 86400

# HELP edgeflow_cpu_usage_percent CPU usage
# TYPE edgeflow_cpu_usage_percent gauge
edgeflow_cpu_usage_percent 23.5

# HELP edgeflow_memory_used_bytes Memory usage
# TYPE edgeflow_memory_used_bytes gauge
edgeflow_memory_used_bytes 1073741824

# HELP edgeflow_api_requests_total Total API requests
# TYPE edgeflow_api_requests_total counter
edgeflow_api_requests_total 15234

# HELP edgeflow_api_response_time_avg Average response time (ms)
# TYPE edgeflow_api_response_time_avg gauge
edgeflow_api_response_time_avg 12.5

Cloud Telemetry Sync

When connected to the SaaS platform, device metrics are automatically synced to the cloud:

  • On Connect — Full system metrics report sent immediately
  • Periodic Sync — Shadow updates include system state every 5 minutes
  • On Demand — Cloud can query metrics via get_system_metrics command
  • Execution Events — Flow execution records available via get_executions

Resource Alerts

The resource monitor raises alerts when thresholds are exceeded:

Alert Threshold Severity
High CPU > 90% for 5 minutes Warning
Low Memory < 100 MB free Critical
Memory Soft Limit > 4 GB used Warning
Memory Hard Limit > 8 GB used Critical (auto-disable modules)
High Temperature > 80°C Warning
Disk Full > 90% used Critical