AI-Powered Smart Doorbell
Build a smart doorbell that captures a visitor photo, uses a local LLM (LLaVA via Ollama) to describe who is at the door, and sends an intelligent notification to your phone via Telegram. All AI processing runs locally on your hardware -- no cloud APIs needed.
Flow Architecture
[GPIO In: Button] --> [HTTP Request: Capture Image] --+
|
[HTTP In: POST /doorbell] -----------------------------+
|
v
[Function: Prepare Image]
|
v
[Ollama: LLaVA Vision]
"Describe this visitor"
|
v
[Function: Format Message]
|
v
[Telegram: Send with Photo] What You'll Need
Hardware
- • Raspberry Pi 4/5 (4GB+ RAM recommended)
- • Pi Camera Module or IP camera with snapshot URL
- • Push button + 10kΩ pull-down resistor (for GPIO trigger)
- • Optional: separate GPU server for faster LLM inference
Software
- • EdgeFlow installed and running
- • Ollama installed with LLaVA model pulled
- • Telegram Bot (via BotFather)
- • Optional: motion (for Pi Camera HTTP snapshots)
Step-by-Step Setup
Install Ollama
Install Ollama on the machine that will run the LLM. This can be the Pi itself (slower) or a separate server with a GPU (recommended for faster inference).
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Verify installation
ollama --version Pull the LLaVA Vision Model
LLaVA is a multimodal model that can understand images. Pull the model -- this may take several minutes depending on your internet speed (the model is approximately 4.7GB).
# Pull LLaVA model (4.7GB)
ollama pull llava
# Test it works
ollama run llava "Describe this image" --image test.jpg
# For a smaller/faster model, try:
ollama pull llava:7b Set Up the Camera
Configure your camera to provide an HTTP snapshot URL. For a Pi Camera, use the
motion package. For an IP camera, find the
snapshot URL in the camera's documentation.
# Pi Camera with motion
sudo apt install motion
# Edit /etc/motion/motion.conf:
# stream_port 8081
# snapshot_interval 0
# webcontrol_port 8080
sudo systemctl start motion
# Snapshot URL: http://localhost:8080/0/action/snapshot
# Stream URL: http://localhost:8081
# IP Camera examples:
# http://192.168.1.100/snapshot.jpg
# rtsp://user:pass@192.168.1.100:554/stream1 Create a Telegram Bot
Open Telegram, search for @BotFather, and send
/newbot. Follow the prompts to get your bot token.
Then send a message to your bot and use the API to find your chat ID.
# After creating the bot, get your chat ID:
curl "https://api.telegram.org/botYOUR_BOT_TOKEN/getUpdates"
# Look for: "chat":{"id":123456789,...}
# That number is your chat ID Import the Flow
Copy the flow JSON below. In EdgeFlow, go to Menu → Import, paste the JSON, and click Import.
Configure and Deploy
Update the following in the flow:
camera snapshot URL, Ollama host (default: http://localhost:11434),
Telegram bot token and chat ID. Then click Deploy.
Press the doorbell button or send a POST request to /doorbell to test.
Configuration Details
Ollama Node Configuration
| Property | Value | Notes |
|---|---|---|
| host | http://localhost:11434 | Change if Ollama runs on another machine |
| model | llava | Multimodal vision model |
| temperature | 0.3 | Low for consistent descriptions |
| prompt | (see below) | Custom prompt for doorbell context |
LLaVA Prompt Engineering
The prompt is critical for getting useful descriptions. Here is the optimized prompt used in the flow:
You are a smart doorbell assistant. Analyze this doorbell camera image and provide
a brief, useful description of the visitor. Include:
1. Number of people visible
2. Apparent gender and approximate age
3. Notable clothing or accessories
4. Whether they are carrying packages or items
5. Any visible vehicles in the background
6. Overall assessment (delivery person, neighbor, stranger, etc.)
Keep the description to 2-3 sentences. Be factual and concise.
Do not speculate about identity or intentions. Function Node Code
Prepare Image for LLM
This function takes the captured image (as a buffer) and prepares it for the Ollama vision API:
// Prepare image for Ollama LLaVA model
// Input: msg.payload = image buffer from HTTP request
// Output: msg with base64 image for Ollama node
var imageBuffer = msg.payload;
var base64Image = imageBuffer.toString('base64');
// Store original image for Telegram later
flow.set('doorbell_image', imageBuffer);
flow.set('doorbell_time', new Date().toLocaleString());
msg.payload = {
model: "llava",
prompt: "You are a smart doorbell assistant. Analyze this doorbell camera image and provide a brief description of the visitor. Include number of people, appearance, clothing, packages, and your assessment of who they might be (delivery, neighbor, stranger). Keep it to 2-3 sentences.",
images: [base64Image],
stream: false,
options: {
temperature: 0.3
}
};
return msg; Format Telegram Message
This function formats the LLM response into a nice Telegram notification:
// Format the Ollama response for Telegram
var description = msg.payload.response || msg.payload;
var timestamp = flow.get('doorbell_time') || new Date().toLocaleString();
var image = flow.get('doorbell_image');
// Build Telegram message
msg.payload = {
type: "photo",
content: image,
caption: "🛎️ *Doorbell Ring*\n"
+ "🕒 " + timestamp + "\n\n"
+ "👤 *Visitor Description:*\n"
+ description + "\n\n"
+ "_AI-powered by LLaVA (local)_",
options: {
parse_mode: "Markdown"
}
};
return msg; Complete Flow JSON
Copy and import this flow into EdgeFlow via Menu → Import.
{
"name": "AI-Powered Smart Doorbell",
"nodes": [
{
"id": "gpio_button",
"type": "gpio-in",
"name": "Doorbell Button",
"pin": 17,
"edge": "rising",
"debounce": 500,
"x": 120,
"y": 120
},
{
"id": "http_in_doorbell",
"type": "http-in",
"name": "POST /doorbell",
"method": "post",
"url": "/doorbell",
"x": 120,
"y": 240
},
{
"id": "http_capture",
"type": "http-request",
"name": "Capture Image",
"method": "GET",
"url": "http://localhost:8080/0/action/snapshot",
"returnType": "bin",
"x": 360,
"y": 180
},
{
"id": "func_prepare",
"type": "function",
"name": "Prepare Image for LLM",
"code": "var imageBuffer = msg.payload;\nvar base64Image = imageBuffer.toString('base64');\nflow.set('doorbell_image', imageBuffer);\nflow.set('doorbell_time', new Date().toLocaleString());\nmsg.payload = { model: 'llava', prompt: 'You are a smart doorbell assistant. Analyze this doorbell camera image and provide a brief description of the visitor. Include number of people, appearance, clothing, packages, and assessment. Keep to 2-3 sentences.', images: [base64Image], stream: false, options: { temperature: 0.3 } };\nreturn msg;",
"x": 580,
"y": 180
},
{
"id": "ollama_llava",
"type": "ollama",
"name": "LLaVA Vision",
"host": "http://localhost:11434",
"model": "llava",
"x": 800,
"y": 180
},
{
"id": "func_format",
"type": "function",
"name": "Format Message",
"code": "var description = msg.payload.response || msg.payload;\nvar timestamp = flow.get('doorbell_time') || new Date().toLocaleString();\nvar image = flow.get('doorbell_image');\nmsg.payload = { type: 'photo', content: image, caption: 'Doorbell Ring\n' + timestamp + '\n\nVisitor Description:\n' + description, options: { parse_mode: 'Markdown' } };\nreturn msg;",
"x": 1020,
"y": 180
},
{
"id": "telegram_send",
"type": "telegram",
"name": "Send Notification",
"botToken": "YOUR_BOT_TOKEN",
"chatId": "YOUR_CHAT_ID",
"x": 1240,
"y": 180
},
{
"id": "http_response",
"type": "http-response",
"name": "OK Response",
"statusCode": 200,
"x": 1240,
"y": 280
},
{
"id": "debug_desc",
"type": "debug",
"name": "Log Description",
"x": 1240,
"y": 100
}
],
"connections": [
{ "from": "gpio_button", "to": "http_capture" },
{ "from": "http_in_doorbell", "to": "http_capture" },
{ "from": "http_capture", "to": "func_prepare" },
{ "from": "func_prepare", "to": "ollama_llava" },
{ "from": "ollama_llava", "to": "func_format" },
{ "from": "func_format", "to": "telegram_send" },
{ "from": "func_format", "to": "http_response" },
{ "from": "ollama_llava", "to": "debug_desc" }
]
} Expected Output
When someone presses the doorbell, you receive a Telegram message like this:
Doorbell Ring
2/12/2026, 2:34:15 PM
Visitor Description:
A middle-aged man wearing glasses and a blue jacket is standing at the front door, carrying
a medium-sized cardboard package. He appears to be a delivery person. No vehicles visible
in the driveway.
AI-powered by LLaVA (local)
The debug node also logs the raw LLM response for review:
{
"payload": {
"model": "llava",
"response": "A middle-aged man wearing glasses and a blue jacket is standing at the front door, carrying a medium-sized cardboard package. He appears to be a delivery person. No vehicles visible in the driveway.",
"done": true,
"total_duration": 2847000000,
"eval_count": 42
}
} Troubleshooting
Ollama connection refused
Verify Ollama is running with systemctl status ollama.
If running on a different machine, ensure the host is set to http://REMOTE_IP:11434
and that the firewall allows port 11434. You may need to set
OLLAMA_HOST=0.0.0.0 in the Ollama environment.
LLaVA response is very slow
On a Raspberry Pi 4, LLaVA can take 15-30 seconds. For faster results, run Ollama on a
separate machine with a GPU. Alternatively, try the smaller
llava:7b model. Ensure no other heavy processes are
consuming memory or CPU on the Pi.
Camera snapshot returns error
Test the snapshot URL directly in a browser. For Pi Camera via motion, ensure the service is running and the correct port is configured. For IP cameras, check authentication credentials in the URL. Some cameras require digest auth rather than basic auth.
Telegram photo not sending
Verify the bot token and chat ID are correct. The image buffer must be a valid JPEG or PNG. Check that the Telegram bot has not been blocked. Use the debug node to inspect the image buffer size -- it should be at least a few KB for a valid image.