Scrape URL
Scrape a webpage and convert it to clean, LLM-ready markdown or other formats. Handles JavaScript rendering, extracts main content, and returns structured data with metadata. Ideal for: content extraction, RAG data collection, web research, article summarization, data aggregation.
Catalog action Integrations
Scrape a webpage and convert it to clean, LLM-ready markdown or other formats. Handles JavaScript rendering, extracts main content, and returns structured data with metadata. Ideal for: content extraction, RAG data collection, web research, article summarization, data aggregation.
At a Glance
| Field | Value |
|---|---|
| Action ID | firecrawl-scrape-url |
| Category | Integrations |
| Connector | Not required |
| Requires gas | No |
| Funds movement | None declared |
| Tags | firecrawl, scrape, web, content, markdown, extraction, read |
Payload Schema
| Field | Type | Required | Description |
|---|---|---|---|
url | string | Yes | The URL to scrape. Must be a valid HTTP/HTTPS URL. |
formats | array | No | Output formats to return. Options: 'markdown' (clean text), 'html' (processed HTML), 'rawHtml' (original HTML), 'links' (extracted URLs), 'screenshot' (page image), 'json' (structured data), 'summary' (AI summary), 'images' (image URLs). Default: ['markdown'] |
onlyMainContent | boolean | No | Extract only the main content, excluding headers, navigation, and footers. Default: true |
waitFor | number | No | Time in milliseconds to wait for JavaScript to render before scraping. Useful for dynamic sites. Range: 0-30000. Deprecated: use actions instead. |
actions | array | No | Page actions to perform before scraping. Use for JavaScript-heavy sites that need interaction before content is visible. |
timeout | number | No | Request timeout in milliseconds. Default: 30000 (30 seconds). Max: 300000 (5 minutes). |
Result Schema
| Field | Type | Required | Description |
|---|---|---|---|
success | boolean | Yes | Whether the scrape request was successful |
data | object | Yes | - |
Examples
json{ "type": "firecrawl-scrape-url", "payload": { "url": "https://example.com/webhook" }, "children": []}
bashcurl -X POST "https://api.b3os.org/v1/actions/firecrawl-scrape-url/test" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "inputs": { "url": "https://example.com/webhook" }}'
Payload fields can use workflow expressions such as {{$trigger.body.amount}}, {{$nodes.fetch.result.price}}, and {{$props.asset}} when the value should come from a trigger, prior node, or reusable workflow prop.
