Skip to main content
Advanced AI / LLM MQTT Smart Home

Voice-Controlled GPT Assistant

Build a voice-controlled smart home assistant that receives transcribed speech, uses GPT-4 to parse natural language into structured device commands, and publishes actions via MQTT to control lights, thermostats, locks, and other smart devices. Supports freeform commands like "Turn on the living room lights to 80%" or "Set the kitchen thermostat to 22 degrees."

6
Nodes Used
~30min
Build Time
GPT-4
AI Model
MQTT
Device Protocol

Flow Architecture

[HTTP In: POST /voice] --> [Function: Build GPT Prompt] --> [OpenAI: GPT-4]
  (transcribed text)       (device list + system prompt)    (parse intent)
                                                                 |
                                                                 v
                                                     [Function: Parse Response]
                                                     (extract JSON command)
                                                                 |
                                                                 v
                                                     [Switch: By Device Type]
                                                       /    |    |    \
                                                      v     v    v     v
                                                  [MQTT] [MQTT] [MQTT] [MQTT]
                                               light  thermo  lock   media
                                                  |
                                               [Catch] --> [Debug: Errors]

What You'll Need

Services

  • OpenAI API key (GPT-4 or GPT-4o access)
  • MQTT broker (Mosquitto, EMQX, or HiveMQ)
  • Speech-to-text service (Whisper, Google STT, etc.)

Software

  • EdgeFlow installed and running
  • MQTT-connected smart home devices (or simulators)
  • Optional: Whisper locally via OpenAI API or Whisper.cpp

Device Inventory

The device inventory tells GPT-4 which devices exist and how to control them. This list is included in the system prompt so the model knows what commands are valid.

Device Location MQTT Topic Actions Parameters
light living_room home/living_room/light on, off, brightness brightness: 0-100
light bedroom home/bedroom/light on, off, brightness, color brightness: 0-100, color: hex
light kitchen home/kitchen/light on, off, brightness brightness: 0-100
thermostat kitchen home/kitchen/thermostat set_temp, mode temperature: 15-30, mode: heat/cool/auto
thermostat living_room home/living_room/thermostat set_temp, mode temperature: 15-30, mode: heat/cool/auto
lock front_door home/front_door/lock lock, unlock --
media living_room home/living_room/media play, pause, volume, skip volume: 0-100
fan bedroom home/bedroom/fan on, off, speed speed: low/medium/high

GPT-4 System Prompt

This system prompt instructs GPT-4 to parse natural language voice commands into structured JSON commands. It includes the full device inventory so GPT-4 knows what is available.

You are a smart home voice assistant. Parse the user's voice command into a
JSON action. You must respond with ONLY valid JSON, no explanation or markdown.

Available devices:
- living_room/light: actions=[on, off], params=[brightness: 0-100]
- bedroom/light: actions=[on, off], params=[brightness: 0-100, color: hex]
- kitchen/light: actions=[on, off], params=[brightness: 0-100]
- kitchen/thermostat: actions=[set_temp, mode], params=[temperature: 15-30, mode: heat|cool|auto]
- living_room/thermostat: actions=[set_temp, mode], params=[temperature: 15-30, mode: heat|cool|auto]
- front_door/lock: actions=[lock, unlock]
- living_room/media: actions=[play, pause, volume, skip], params=[volume: 0-100]
- bedroom/fan: actions=[on, off, speed], params=[speed: low|medium|high]

Response format:
{
  "device": "location/type",
  "action": "action_name",
  "params": {},
  "confidence": 0.0-1.0,
  "spoken_response": "Short confirmation message"
}

If the command is ambiguous, pick the most likely device and set confidence < 0.7.
If the command is not about controlling a device, respond with:
{
  "device": "none",
  "action": "none",
  "params": {},
  "confidence": 0,
  "spoken_response": "I can only control smart home devices."
}

Examples:
User: "Turn on the living room lights to 80%"
{"device":"living_room/light","action":"on","params":{"brightness":80},"confidence":0.95,"spoken_response":"Living room lights set to 80%."}

User: "Set kitchen temperature to 22"
{"device":"kitchen/thermostat","action":"set_temp","params":{"temperature":22},"confidence":0.9,"spoken_response":"Kitchen thermostat set to 22 degrees."}

User: "Lock the front door"
{"device":"front_door/lock","action":"lock","params":{},"confidence":0.95,"spoken_response":"Front door locked."}

Step-by-Step Setup

1

Get an OpenAI API Key

Sign up at platform.openai.com, create an API key, and add billing. GPT-4o is recommended for the best balance of speed and quality. Each voice command costs approximately $0.002-$0.005 depending on prompt length.

2

Set Up MQTT Broker

Install and configure an MQTT broker. Mosquitto is the most common choice:

# Install Mosquitto
sudo apt install mosquitto mosquitto-clients

# Enable and start
sudo systemctl enable mosquitto
sudo systemctl start mosquitto

# Test: subscribe in one terminal
mosquitto_sub -t "home/#" -v

# Publish in another terminal
mosquitto_pub -t "home/living_room/light" \
  -m '{"state":"on","brightness":80}'
3

Define Your Device Inventory

Update the device list in the system prompt to match your actual smart home devices. Each device needs: a location, type, MQTT topic, and available actions with parameters. The more specific the device list, the more accurate GPT-4 will be at parsing commands.

4

Import the Flow

Copy the flow JSON below. In EdgeFlow, go to Menu → Import, paste the JSON, and click Import.

5

Connect Speech-to-Text

The flow expects a POST request to /voice with the transcribed text. You can connect any speech-to-text service. For testing, use curl:

# Test with curl
curl -X POST http://localhost:1880/voice \
  -H "Content-Type: application/json" \
  -d '{"text": "Turn on the living room lights to 80%"}'

# With Whisper (local):
# Record audio, transcribe with Whisper, POST to EdgeFlow
whisper audio.wav --model small --output_format txt
curl -X POST http://localhost:1880/voice \
  -H "Content-Type: application/json" \
  -d "{\"text\": \"$(cat audio.txt)\"}"

Function Node Code

Build GPT Prompt

This function constructs the OpenAI API request with the system prompt and user command:

// Build the GPT-4 prompt with device inventory
var userCommand = msg.payload.text || msg.payload;

var systemPrompt = "You are a smart home voice assistant. Parse the user's voice command into a JSON action. You must respond with ONLY valid JSON, no explanation or markdown.\n\nAvailable devices:\n- living_room/light: actions=[on, off], params=[brightness: 0-100]\n- bedroom/light: actions=[on, off], params=[brightness: 0-100, color: hex]\n- kitchen/light: actions=[on, off], params=[brightness: 0-100]\n- kitchen/thermostat: actions=[set_temp, mode], params=[temperature: 15-30, mode: heat|cool|auto]\n- living_room/thermostat: actions=[set_temp, mode], params=[temperature: 15-30, mode: heat|cool|auto]\n- front_door/lock: actions=[lock, unlock]\n- living_room/media: actions=[play, pause, volume, skip], params=[volume: 0-100]\n- bedroom/fan: actions=[on, off, speed], params=[speed: low|medium|high]\n\nResponse format: {\"device\": \"location/type\", \"action\": \"action_name\", \"params\": {}, \"confidence\": 0.0-1.0, \"spoken_response\": \"Short confirmation\"}";

msg.payload = {
    model: "gpt-4o",
    temperature: 0.1,
    max_tokens: 200,
    messages: [
        {
            role: "system",
            content: systemPrompt
        },
        {
            role: "user",
            content: userCommand
        }
    ]
};

// Store original command for logging
msg.originalCommand = userCommand;

return msg;

Parse GPT Response

This function extracts the JSON command from GPT-4's response and prepares the MQTT message:

// Parse GPT-4 response and prepare MQTT command
var response = msg.payload.choices[0].message.content;

// Try to parse the JSON response
var command;
try {
    command = JSON.parse(response);
} catch (e) {
    // Try to extract JSON from markdown code block
    var match = response.match(/{[\s\S]*}/);
    if (match) {
        command = JSON.parse(match[0]);
    } else {
        node.error("Failed to parse GPT response: " + response);
        return null;
    }
}

// Skip if no device match
if (command.device === "none" || command.confidence < 0.3) {
    msg.payload = {
        success: false,
        message: command.spoken_response || "Command not understood",
        originalCommand: msg.originalCommand
    };
    return [null, msg]; // Send to second output (error/log)
}

// Build MQTT topic from device
var topic = "home/" + command.device;

// Build MQTT payload
var mqttPayload = {
    action: command.action,
    ...command.params,
    source: "voice",
    timestamp: Date.now()
};

msg.topic = topic;
msg.payload = JSON.stringify(mqttPayload);

// Add metadata for HTTP response
msg._response = {
    success: true,
    device: command.device,
    action: command.action,
    params: command.params,
    confidence: command.confidence,
    spoken_response: command.spoken_response,
    originalCommand: msg.originalCommand
};

return [msg, null]; // Send to first output (MQTT)

Complete Flow JSON

Copy and import this flow into EdgeFlow via Menu → Import.

{
  "name": "Voice-Controlled GPT Assistant",
  "nodes": [
    {
      "id": "http_in_voice",
      "type": "http-in",
      "name": "POST /voice",
      "method": "post",
      "url": "/voice",
      "x": 120,
      "y": 200
    },
    {
      "id": "func_build_prompt",
      "type": "function",
      "name": "Build GPT Prompt",
      "code": "var userCommand = msg.payload.text || msg.payload;\nvar systemPrompt = 'You are a smart home voice assistant. Parse the user\'s voice command into a JSON action. Respond with ONLY valid JSON.\n\nAvailable devices:\n- living_room/light: actions=[on, off], params=[brightness: 0-100]\n- bedroom/light: actions=[on, off], params=[brightness: 0-100, color: hex]\n- kitchen/light: actions=[on, off], params=[brightness: 0-100]\n- kitchen/thermostat: actions=[set_temp, mode], params=[temperature: 15-30, mode: heat|cool|auto]\n- living_room/thermostat: actions=[set_temp, mode], params=[temperature: 15-30]\n- front_door/lock: actions=[lock, unlock]\n- living_room/media: actions=[play, pause, volume, skip], params=[volume: 0-100]\n- bedroom/fan: actions=[on, off, speed], params=[speed: low|medium|high]\n\nFormat: {"device":"loc/type","action":"name","params":{},"confidence":0-1,"spoken_response":"msg"}';\nmsg.payload = { model: 'gpt-4o', temperature: 0.1, max_tokens: 200, messages: [{role:'system',content:systemPrompt}, {role:'user',content:userCommand}] };\nmsg.originalCommand = userCommand;\nreturn msg;",
      "outputs": 1,
      "x": 360,
      "y": 200
    },
    {
      "id": "openai_gpt4",
      "type": "openai",
      "name": "GPT-4o Parse Intent",
      "apiKey": "YOUR_OPENAI_API_KEY",
      "x": 600,
      "y": 200
    },
    {
      "id": "func_parse_response",
      "type": "function",
      "name": "Parse GPT Response",
      "code": "var response = msg.payload.choices[0].message.content;\nvar command;\ntry { command = JSON.parse(response); } catch(e) { var m = response.match(/{[\\s\\S]*}/); if(m) command = JSON.parse(m[0]); else { node.error('Parse fail: '+response); return null; } }\nif(command.device==='none'||command.confidence<0.3) { msg.payload={success:false,message:command.spoken_response}; return [null,msg]; }\nmsg.topic='home/'+command.device;\nmsg.payload=JSON.stringify({action:command.action,...command.params,source:'voice',timestamp:Date.now()});\nmsg._response={success:true,device:command.device,action:command.action,params:command.params,confidence:command.confidence,spoken_response:command.spoken_response};\nreturn [msg,null];",
      "outputs": 2,
      "x": 840,
      "y": 200
    },
    {
      "id": "mqtt_out_devices",
      "type": "mqtt-out",
      "name": "Publish Command",
      "broker": "mqtt_broker",
      "x": 1100,
      "y": 160
    },
    {
      "id": "mqtt_broker",
      "type": "mqtt-broker-config",
      "name": "Local MQTT",
      "host": "localhost",
      "port": 1883
    },
    {
      "id": "func_http_resp",
      "type": "function",
      "name": "Build HTTP Response",
      "code": "msg.payload = msg._response || { success: true, message: 'Command sent' };\nreturn msg;",
      "x": 1100,
      "y": 240
    },
    {
      "id": "http_response",
      "type": "http-response",
      "name": "Respond",
      "statusCode": 200,
      "x": 1320,
      "y": 240
    },
    {
      "id": "debug_command",
      "type": "debug",
      "name": "Log Command",
      "x": 1100,
      "y": 80
    },
    {
      "id": "debug_errors",
      "type": "debug",
      "name": "Log Errors",
      "x": 1100,
      "y": 320
    },
    {
      "id": "catch_errors",
      "type": "catch",
      "name": "Catch Errors",
      "scope": "all",
      "x": 840,
      "y": 360
    }
  ],
  "connections": [
    { "from": "http_in_voice", "to": "func_build_prompt" },
    { "from": "func_build_prompt", "to": "openai_gpt4" },
    { "from": "openai_gpt4", "to": "func_parse_response" },
    { "from": "func_parse_response", "to": "mqtt_out_devices", "port": 0 },
    { "from": "func_parse_response", "to": "debug_errors", "port": 1 },
    { "from": "mqtt_out_devices", "to": "func_http_resp" },
    { "from": "func_parse_response", "to": "func_http_resp", "port": 0 },
    { "from": "func_http_resp", "to": "http_response" },
    { "from": "mqtt_out_devices", "to": "debug_command" },
    { "from": "catch_errors", "to": "debug_errors" }
  ]
}

Expected Output

Voice command: "Turn on living room lights to 80%"

GPT-4 Response (parsed)

{
  "device": "living_room/light",
  "action": "on",
  "params": {
    "brightness": 80
  },
  "confidence": 0.95,
  "spoken_response": "Living room lights set to 80%."
}

MQTT Message Published

Topic: home/living_room/light
Payload:
{
  "action": "on",
  "brightness": 80,
  "source": "voice",
  "timestamp": 1707753600000
}

HTTP Response

{
  "success": true,
  "device": "living_room/light",
  "action": "on",
  "params": {
    "brightness": 80
  },
  "confidence": 0.95,
  "spoken_response": "Living room lights set to 80%."
}

Example Voice Commands

"Set the kitchen thermostat to 22 degrees"

→ home/kitchen/thermostat → {"action":"set_temp","temperature":22,"source":"voice"}

"Lock the front door"

→ home/front_door/lock → {"action":"lock","source":"voice"}

"Turn the bedroom fan to high"

→ home/bedroom/fan → {"action":"speed","speed":"high","source":"voice"}

"Pause the music in the living room"

→ home/living_room/media → {"action":"pause","source":"voice"}

Troubleshooting

OpenAI API returns 401 Unauthorized

Verify your API key is correct and has not been revoked. Check that billing is set up on your OpenAI account. Ensure the API key has access to the GPT-4 model you specified.

GPT response contains markdown instead of raw JSON

The parse function includes a fallback regex to extract JSON from markdown code blocks. If this persists, add response_format: "json_object" to the OpenAI API request (supported in GPT-4o and newer). This forces strict JSON output.

MQTT messages not reaching devices

Subscribe to home/# with mosquitto_sub to verify messages are being published. Check that MQTT topics match your device configuration exactly. Verify the MQTT broker host and port in the EdgeFlow node configuration.

GPT misidentifies the target device

Make the device inventory more specific. Add aliases (e.g., "main light" = living_room/light). Lower the temperature to 0.0 for maximum determinism. Add more examples to the system prompt to improve accuracy for your typical commands.

High latency on voice commands

GPT-4o typically responds in 0.5-1.5 seconds. If latency is too high, try gpt-4o-mini for faster response at slightly lower accuracy. Reduce max_tokens to 150. Ensure your internet connection is stable.

Next Steps