Skip to main content

System Monitor

Real-time system metrics, resource monitoring, and performance analysis for EdgeFlow

EdgeFlow includes a comprehensive monitoring system that tracks hardware resources, flow execution metrics, and system health in real-time. Monitor CPU, memory, disk, temperature, and network usage from the built-in dashboard.

Monitor Overview

The System Monitor dashboard provides an at-a-glance view of your device's health and performance. Circular gauges show real-time utilization with color-coded thresholds that shift from green to yellow to red as resources approach their limits.

23%
CPU
156MB
Memory / 1GB
45%
Disk
52°C
Temp
Board Raspberry Pi 4 Model B
Uptime 12d 5h 23m
Hostname edgeflow

System Metrics

EdgeFlow collects system metrics from the Linux /proc and /sys filesystems, providing detailed insight into hardware utilization without external dependencies.

CPU Monitoring

CPU metrics are gathered from Go's runtime package and /proc/loadavg, polled every 3 seconds by default.

CPU Usage
23%
Load 1m
0.45
Load 5m
0.38
Load 15m
0.32
Metric Source Update Interval
CPU Usage % Runtime 3s
Load Average /proc/loadavg 3s
Core Count Runtime Static
Goroutines Runtime 3s

Memory Usage

Memory information is read from /proc/meminfo and displayed as a segmented breakdown bar. The monitor tracks total, used, available, and swap partitions.

Used
Cached
Available
Used (358 MB)
Cached (256 MB)
Available (410 MB)
Total1024 MB
Used358 MB
Free154 MB
Available410 MB
Swap Total512 MB
Swap Used24 MB

Disk Usage

Disk utilization is reported per-partition. EdgeFlow monitors the root filesystem and any mounted data partitions.

/ (root)
45%
/boot
28%
Partition Total Used Available Use%
/ 29.7 GB 13.4 GB 14.8 GB 45%
/boot 256 MB 72 MB 184 MB 28%

Temperature Monitoring

On Raspberry Pi and other supported boards, EdgeFlow reads the CPU temperature from /sys/class/thermal/thermal_zone0/temp. The value is displayed as a gauge with color-coded zones indicating thermal state.

< 60°C Normal
60-70°C Warm
70-80°C Hot
≥ 80°C Throttling
52°C
Thermal Throttling Warning

When the CPU temperature reaches 80°C or above, the Raspberry Pi firmware automatically throttles the CPU frequency to prevent damage. This can significantly reduce EdgeFlow performance. Consider adding a heatsink or fan if temperatures regularly exceed 70°C.

Network

Network statistics are read from /proc/net/dev and polled at the same interval as other system metrics. Speed is calculated as the byte delta between consecutive polls.

Interface RX Bytes TX Bytes Speed
eth0 1.24 GB 856 MB 2.4 MB/s
wlan0 342 MB 128 MB 450 KB/s
lo 56 MB 56 MB 1.2 MB/s

Monitoring API Endpoints

All monitoring data is accessible through REST API endpoints. These are the same endpoints used by the built-in dashboard and can be consumed by external tools.

Method Endpoint Description
GET /api/v1/resources/stats CPU, memory, disk, goroutines, system info
GET /api/v1/resources/report Module manager format (MB units)
GET /api/v1/system/info Hostname, OS, arch, board model, uptime, temperature, load avgs, swap
GET /api/v1/system/network Network interfaces with IPv4/IPv6, MAC, MTU
GET /api/v1/system/wifi/scan WiFi networks (SSID, signal, security, channel)
POST /api/v1/system/reboot Reboot system (Linux only, requires sudo)
POST /api/v1/system/restart-service Restart EdgeFlow service

Example Response

GET /api/v1/system/info

Response:
{
  "hostname": "edgeflow",
  "os": "linux",
  "arch": "arm64",
  "board": "Raspberry Pi 4 Model B",
  "uptime": "12d 5h 23m",
  "temperature": 52.1,
  "load_avg": [0.45, 0.38, 0.32],
  "memory": {
    "total": 1073741824,
    "used": 163577856
  }
}

Resource Limits & Auto-Management

EdgeFlow defines soft and hard resource thresholds in its ResourceLimits configuration. When thresholds are crossed, the system takes progressive action from logging warnings to automatically disabling non-essential modules.

Threshold Action Default
Memory Soft Limit Warning, log message 80%
Memory Hard Limit Auto-disable non-essential modules 90%
Disk Warning Log warning 85%
Disk Critical Alert notification 95%
Low Memory Threshold Prevent module loading 50 MB available

When memory usage exceeds the hard limit, EdgeFlow automatically disables non-essential modules to reclaim resources. The CanLoadModule() function checks available memory before allowing new modules to be loaded, preventing out-of-memory situations on constrained devices.

Proactive Resource Management

EdgeFlow proactively manages resources to prevent system instability on constrained devices. Modules are gracefully stopped and can be re-enabled once resources are available again.

Health Check System

The health check system runs periodic checks against critical subsystems and reports an aggregate status. Each check returns one of three severity levels:

healthy All checks passing, system operating normally
degraded Some checks show warnings, system functional but under stress
unhealthy Critical check failing, immediate attention required

Built-in Health Checks

Check What It Tests Healthy Degraded Unhealthy
Database DB connection ping OK - Fail
Disk Space Filesystem usage < 85% 85-95% > 95%
Memory RAM utilization < 90% > 90% -
Goroutines Active goroutine count Normal Excessive -

The health endpoint is available at GET /api/v1/health and returns the aggregate status along with individual check results. Periodic checks run automatically on a configurable interval (default: 30 seconds).

GET /api/v1/health

{
  "status": "healthy",
  "checks": {
    "database": { "status": "healthy", "latency_ms": 2 },
    "disk": { "status": "healthy", "usage_percent": 45 },
    "memory": { "status": "healthy", "usage_percent": 35 },
    "goroutines": { "status": "healthy", "count": 42 }
  }
}

Flow & Execution Metrics

EdgeFlow exposes Prometheus-format metrics for flow execution, node activity, and system performance. These metrics can be scraped by Prometheus and visualized in Grafana.

Metric Type Description
edgeflow_flows_total counter Total flows created
edgeflow_flows_running gauge Currently running flows
edgeflow_executions_total counter Total flow executions
edgeflow_executions_failed counter Failed executions
edgeflow_nodes_total gauge Total registered nodes
edgeflow_nodes_active gauge Currently active nodes
edgeflow_uptime_seconds gauge Server uptime
edgeflow_memory_used_bytes gauge Memory usage
edgeflow_goroutines gauge Active goroutines
edgeflow_api_requests_total counter Total API requests
edgeflow_api_errors_total counter API errors
edgeflow_api_response_time_ms gauge Average response time

Prometheus & Grafana Integration

Enabling Prometheus Metrics

Enable Prometheus metrics export in Settings > Metrics. Once enabled, EdgeFlow exposes a /metrics endpoint on the configured Prometheus port (default: 9090) that returns all metrics in Prometheus text format.

Add EdgeFlow as a scrape target in your prometheus.yml configuration:

scrape_configs:
  - job_name: 'edgeflow'
    static_configs:
      - targets: ['raspberrypi:9090']
    scrape_interval: 15s

Grafana Dashboard

To visualize EdgeFlow metrics in Grafana:

  1. Open Grafana and navigate to Configuration > Data Sources
  2. Click Add data source and select Prometheus
  3. Set the URL to your Prometheus server (e.g., http://localhost:9090)
  4. Click Save & Test to verify the connection
  5. Import the EdgeFlow dashboard template from Dashboards > Import
  6. Select the Prometheus data source and click Import
Community Dashboards

Check grafana.com/dashboards for community-contributed EdgeFlow dashboard templates that include pre-built panels for CPU, memory, flow execution rates, and error tracking.

Log Viewer

The built-in Log Viewer streams real-time log entries via WebSocket, supporting log, flow_status, and node_status event types. It retains the last 500 entries in the browser for scrollback.

ALL DEBUG INFO WARN ERROR
Search... Pause Clear
DBG 14:23:01.234 flow-engine Flow "Temperature Monitor" tick completed in 2ms
INF 14:23:01.456 mqtt-broker Client connected: sensor-hub-01
WRN 14:23:02.012 resources Memory usage at 82% - approaching soft limit
ERR 14:23:02.345 node:mqtt-out Connection refused: broker unreachable at 192.168.1.50:1883
OK 14:23:03.678 health All health checks passing

Log Viewer Features

  • Level Filter - Filter by ALL, DEBUG, INFO, WARN, or ERROR
  • Search - Full-text search across log messages
  • Pause / Resume - Freeze the stream to inspect entries
  • Auto-scroll - Automatically scrolls to newest entries when active
  • Retention - Keeps the last 500 entries in the browser

Log Sources

Source Description
Backend Logs Server-side log output (Go runtime, services, handlers)
Flow Events Flow start, stop, deploy, error events
Node Events Individual node status changes and output
Frontend Actions User actions in the web UI (deploy, save, etc.)

Level Indicators

DBG Debug - Detailed diagnostic information
INF Info - General operational messages
WRN Warn - Potential issues requiring attention
ERR Error - Failures requiring investigation
OK Success - Successful operations