feat(metadata): extract <link rel="alternate"> tags into metadata by firecrawl-spring[bot] · Pull Request #3250 · firecrawl/firecrawl

firecrawl-spring · 2026-03-30T13:48:06Z

Summary

Adds alternateLinks field to scrape response metadata that captures all <link rel="alternate"> tags from HTML <head>
Each entry includes href, type, title, and hreflang attributes
Enables RSS/Atom feed discovery and hreflang detection without requiring rawHtml + manual parsing

Changes

Rust native extractor (html.rs): Added extraction of <link rel="alternate"> tags after DC terms metadata
TypeScript fallback (extractMetadata.ts): Added matching cheerio-based extraction
Types (v1/types.ts, v2/types.ts): Added alternateLinks field to Document metadata type

Example response

{
  "metadata": {
    "alternateLinks": [
      {
        "href": "https://www.saastr.com/feed/",
        "type": "application/rss+xml",
        "title": "SaaStr RSS Feed"
      },
      {
        "href": "https://www.saastr.com/feed/atom/",
        "type": "application/atom+xml",
        "title": "SaaStr Atom Feed"
      }
    ]
  }
}

Context

Customer request — currently users need to request rawHtml and parse <link rel="alternate"> tags themselves to discover RSS/Atom feeds or hreflang links. This makes it a first-class metadata field.

Test plan

Scrape saastr.com and verify alternateLinks contains RSS and Atom feed entries
Scrape a site with hreflang tags and verify hreflang attribute is captured
Scrape a site with no <link rel="alternate"> tags and verify field is absent (not empty array)
Verify Rust extractor produces same results as TypeScript fallback

Summary by cubic

Adds alternateLinks to document metadata by extracting all tags from the HTML head. This enables RSS/Atom feed discovery and hreflang detection without parsing rawHtml.

New Features
- Captures href, type, title, and hreflang for each alternate link.
- Implemented in the Rust native extractor and the TypeScript cheerio fallback; field is omitted when no matches exist.
- Updated v1 and v2 Document types to include alternateLinks.

^{Written for commit 493f43f. Summary will update on new commits.}

Add `alternateLinks` field to metadata that captures all <link rel="alternate"> tags from HTML head, including href, type, title, and hreflang attributes. This enables feed discovery (RSS/Atom) and hreflang detection without requiring rawHtml + manual parsing. Implemented in both the Rust native extractor and the TypeScript/cheerio fallback, with corresponding type updates in v1 and v2 Document types. Co-Authored-By: micahstairs <micah@sideguide.dev>

cubic-dev-ai

1 issue found across 4 files

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="apps/api/src/scraper/scrapeURL/lib/extractMetadata.ts">

<violation number="1" location="apps/api/src/scraper/scrapeURL/lib/extractMetadata.ts:156">
P2: Match `rel` as a token, not an exact string. `rel` is space-separated; the current selector misses valid `<link rel="alternate ...">` tags that include additional tokens.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai · 2026-03-30T13:51:56Z

+      title?: string;
+      hreflang?: string;
+    }[] = [];
+    soup('link[rel="alternate"]').each((_, elem) => {


P2: Match rel as a token, not an exact string. rel is space-separated; the current selector misses valid <link rel="alternate ..."> tags that include additional tokens.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At apps/api/src/scraper/scrapeURL/lib/extractMetadata.ts, line 156: <comment>Match `rel` as a token, not an exact string. `rel` is space-separated; the current selector misses valid `<link rel="alternate ...">` tags that include additional tokens.</comment> <file context> @@ -143,6 +146,32 @@ export async function extractMetadata( + title?: string; + hreflang?: string; + }[] = []; + soup('link[rel="alternate"]').each((_, elem) => { + const link: { + href?: string; </file context>

Suggested change

soup('link[rel="alternate"]').each((_, elem) => {

soup('link[rel~="alternate"]').each((_, elem) => {

firecrawl-spring Bot requested a review from mogery as a code owner March 30, 2026 13:48

firecrawl-spring Bot requested a review from micahstairs March 30, 2026 13:48

cubic-dev-ai Bot reviewed Mar 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(metadata): extract <link rel="alternate"> tags into metadata#3250

feat(metadata): extract <link rel="alternate"> tags into metadata#3250
firecrawl-spring[bot] wants to merge 1 commit intomainfrom
feat/metadata-alternate-links

firecrawl-spring Bot commented Mar 30, 2026 •

edited by cubic-dev-ai Bot

Loading

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

cubic-dev-ai Bot Mar 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

	soup('link[rel="alternate"]').each((_, elem) => {
	soup('link[rel~="alternate"]').each((_, elem) => {

Conversation

firecrawl-spring Bot commented Mar 30, 2026 • edited by cubic-dev-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Example response

Context

Test plan

Summary by cubic

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

firecrawl-spring Bot commented Mar 30, 2026 •

edited by cubic-dev-ai Bot

Loading

cubic-dev-ai Bot Mar 30, 2026 •

edited

Loading