Skip to main content

html

Parse HTML content and extract elements using CSS selectors for web scraping and data extraction.

Overview

The html node parses HTML content and extracts elements using CSS selectors. It is ideal for web scraping, processing HTTP responses, extracting structured data from web pages, and transforming HTML content within your EdgeFlow pipelines.

CSS
Selector Queries
3
Output Modes
Multi
Match Support
Attr
Extraction

Properties

Property Type Required Default Description
selector string Yes - CSS selector to match elements (tag, .class, #id, or combined)
output select No "text" Output type: "text" (inner text), "html" (inner HTML), or "attr" (attribute value)
attr string No "" Attribute name to extract (required when output is "attr")
multiple boolean No false Return all matches as an array instead of the first match only

Inputs

msg.payload

An HTML string to parse. Typically the response body from an HTTP request node.

{
  "payload": "<html><head><title>My Page</title></head><body><h1>Hello</h1></body></html>"
}

Outputs

Single Match (multiple: false)
{
  "payload": "Hello"
}
Multiple Matches (multiple: true)
{
  "payload": [
    "https://example.com/page1",
    "https://example.com/page2",
    "https://example.com/page3"
  ]
}

Output Modes

text

Returns the inner text content of matched elements, with HTML tags stripped.

selector: "h1"
input: "<h1>Hello <em>World</em></h1>"
output: "Hello World"

html

Returns the inner HTML of matched elements, preserving nested markup.

selector: "h1"
input: "<h1>Hello <em>World</em></h1>"
output: "Hello <em>World</em>"

attr

Returns the value of a specific attribute from matched elements.

selector: "a"
attr: "href"
input: "<a href="/about">About</a>"
output: "/about"

Example Flows

Extract All Links from a Page

Fetch a web page and extract every hyperlink URL.

[
  {
    "id": "fetch-page",
    "type": "http-request",
    "method": "GET",
    "url": "https://example.com",
    "ret": "txt"
  },
  {
    "id": "extract-links",
    "type": "html",
    "selector": "a",
    "output": "attr",
    "attr": "href",
    "multiple": true
  },
  {
    "id": "show-links",
    "type": "debug",
    "name": "All Links"
  }
]

// Output:
// {
//   "payload": [
//     "https://example.com/about",
//     "https://example.com/contact",
//     "https://example.com/blog"
//   ]
// }

Get Page Title

Extract the title from an HTML page for monitoring or logging.

[
  {
    "id": "fetch-page",
    "type": "http-request",
    "method": "GET",
    "url": "https://example.com",
    "ret": "txt"
  },
  {
    "id": "get-title",
    "type": "html",
    "selector": "title",
    "output": "text",
    "multiple": false
  },
  {
    "id": "log-title",
    "type": "debug",
    "name": "Page Title"
  }
]

// Output:
// { "payload": "Example Domain" }

Extract Table Data

Scrape tabular data from a web page and display it in a dashboard table.

[
  {
    "id": "fetch-data",
    "type": "http-request",
    "method": "GET",
    "url": "https://example.com/data",
    "ret": "txt"
  },
  {
    "id": "extract-cells",
    "type": "html",
    "selector": "table.data tr td",
    "output": "text",
    "multiple": true
  },
  {
    "id": "format-table",
    "type": "function",
    "name": "Reshape to rows"
  },
  {
    "id": "dashboard-table",
    "type": "ui-table",
    "name": "Scraped Data"
  }
]

// Output from html node:
// {
//   "payload": [
//     "Sensor A", "22.5", "Online",
//     "Sensor B", "18.3", "Offline",
//     "Sensor C", "25.1", "Online"
//   ]
// }

CSS Selector Reference

Selector Example Matches
tag h1 All h1 elements
.class .price Elements with class "price"
#id #main-content Element with id "main-content"
tag.class div.card div elements with class "card"
parent child ul li li elements inside ul
[attr] a[target] Links with a target attribute
[attr=val] input[type="text"] Text input elements

Common Use Cases

Web Scraping

Extract product prices, news headlines, or weather data from websites for IoT dashboards.

API Response Parsing

Parse HTML fragments returned by APIs or legacy services that don't provide JSON.

Link Monitoring

Monitor web pages for broken links, new content, or changes to specific elements.

Content Transformation

Strip HTML to plain text, extract specific sections, or reformat content for notifications.

Related Nodes