improvement(seo): optimize sitemaps and robots.txt across sim and docs#4170
improvement(seo): optimize sitemaps and robots.txt across sim and docs#4170emir-karabeg wants to merge 2 commits intostagingfrom
Conversation
- Add missing pages to sim sitemap: blog author pages, academy catalog and course pages - Fix 6x duplicate URL bug in docs sitemap by deduplicating with source.getLanguages() - Convert docs sitemap from route handler to Next.js metadata convention with native hreflang - Add x-default hreflang alternate for docs multi-language pages - Remove changeFrequency and priority fields (Google ignores both) - Fix inaccurate lastModified timestamps — derive from real content dates, omit when unknown - Consolidate 20+ redundant per-bot robots rules into single wildcard entry - Add /form/ and /credential-account/ to sim robots disallow list - Reference image sitemap in sim robots.txt - Remove deprecated host directive from sim robots - Move disallow rules before allow in docs robots for crawler compatibility - Extract hardcoded docs baseUrl to env variable with production fallback
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
PR SummaryMedium Risk Overview Sim’s sitemap ( Reviewed by Cursor Bugbot for commit eeac0f8. Configure here. |
Greptile SummaryThis PR fixes SEO issues across two apps: the docs sitemap had a 6× duplicate-URL bug (one entry per locale with no hreflang), and the sim sitemap was missing public pages while using Confidence Score: 5/5Safe to merge — all findings are P2 suggestions that don't block correctness in production. The two remaining findings are both P2: the homepage still using new Date() is an inconsistency with the PR's stated goal but doesn't break anything, and the missing empty-array guard on latestModelDate is theoretical given the static data always contains models. All structural changes are correct. apps/sim/app/sitemap.ts — homepage lastModified and latestModelDate guard. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["sitemap()"] --> B["getAllPostMeta()"]
B --> C{"posts.length > 0?"}
C -- yes --> D["latestPostDate = max post date"]
C -- no --> E["latestPostDate = undefined"]
A --> F["latestModelDate = max model updatedAt (no guard)"]
A --> G["staticPages: homepage uses new Date()"]
A --> H["blogPages: per-post updated/date"]
A --> I["authorPages: max post date per author"]
A --> J["integrationPages: no lastModified"]
A --> K["providerPages: max model date"]
A --> L["modelEntries: per-model updatedAt"]
A --> M["academyPages: no lastModified"]
D --> G
E --> G
F --> G
G --> N["MetadataRoute.Sitemap"]
H --> N
I --> N
J --> N
K --> N
L --> N
M --> N
Reviews (1): Last reviewed commit: "improvement(seo): optimize sitemaps and ..." | Re-trigger Greptile |
| url: baseUrl, | ||
| lastModified: now, | ||
| changeFrequency: 'daily', | ||
| priority: 1.0, | ||
| lastModified: new Date(), | ||
| }, |
There was a problem hiding this comment.
Homepage still uses
new Date() as lastModified
The PR's stated rationale is that "new Date() as lastModified on static content trains Google to distrust the signal." All other static pages were either given accurate content-derived dates or had lastModified omitted — but the homepage still has new Date(), so it'll reflect the sitemap generation time rather than an actual page-change date. Without a revalidate export in this file, each ISR revalidation produces a new value.
| url: baseUrl, | |
| lastModified: now, | |
| changeFrequency: 'daily', | |
| priority: 1.0, | |
| lastModified: new Date(), | |
| }, | |
| { | |
| url: baseUrl, | |
| }, |
Or use a pinned date that reflects the last meaningful homepage change.
apps/sim/app/sitemap.ts
Outdated
| const latestModelDate = new Date( | ||
| Math.max( | ||
| ...MODEL_PROVIDERS_WITH_CATALOGS.flatMap((provider) => | ||
| provider.models.map((model) => new Date(model.pricing.updatedAt).getTime()) | ||
| ) | ||
| ) | ||
| ) |
There was a problem hiding this comment.
Missing empty-array guard — potential Invalid Date
latestPostDate is guarded with posts.length > 0, but latestModelDate is not. If MODEL_PROVIDERS_WITH_CATALOGS were empty (or all providers had zero models after the filter), flatMap produces [], Math.max(...[]) returns -Infinity, and new Date(-Infinity).toISOString() throws RangeError: Invalid time value, crashing sitemap generation. Applying the same guard pattern would be consistent:
| const latestModelDate = new Date( | |
| Math.max( | |
| ...MODEL_PROVIDERS_WITH_CATALOGS.flatMap((provider) => | |
| provider.models.map((model) => new Date(model.pricing.updatedAt).getTime()) | |
| ) | |
| ) | |
| ) | |
| const modelTimes = MODEL_PROVIDERS_WITH_CATALOGS.flatMap((provider) => | |
| provider.models.map((model) => new Date(model.pricing.updatedAt).getTime()) | |
| ) | |
| const latestModelDate = modelTimes.length > 0 ? new Date(Math.max(...modelTimes)) : undefined |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Changelog lastModified incorrectly derived from blog posts
- Removed the incorrect lastModified from the changelog sitemap entry since it's driven by GitHub releases, not blog posts, matching the pattern used for /partners.
Or push these changes by commenting:
@cursor push 473032693b
Preview (473032693b)
diff --git a/apps/sim/app/sitemap.ts b/apps/sim/app/sitemap.ts
--- a/apps/sim/app/sitemap.ts
+++ b/apps/sim/app/sitemap.ts
@@ -37,7 +37,6 @@
},
{
url: `${baseUrl}/changelog`,
- lastModified: latestPostDate,
},
{
url: `${baseUrl}/integrations`,This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.
Reviewed by Cursor Bugbot for commit eeac0f8. Configure here.
| url: `${baseUrl}/changelog`, | ||
| lastModified: now, | ||
| lastModified: latestPostDate, | ||
| }, |
There was a problem hiding this comment.
Changelog lastModified incorrectly derived from blog posts
Medium Severity
The /changelog entry uses latestPostDate as its lastModified, but the changelog page is driven by GitHub releases (fetched from api.github.com/repos/simstudioai/sim/releases), not blog posts. This gives search engines an inaccurate modification date that reflects the latest blog post update rather than when the changelog actually changed. Given the PR's stated goal of fixing inaccurate lastModified timestamps, this entry would be better off omitting lastModified entirely (like /partners) since no accurate source is available at sitemap-generation time.
Reviewed by Cursor Bugbot for commit eeac0f8. Configure here.



Summary
x-defaultchangeFrequency/priority(Google ignores both), fix inaccuratelastModifiedtimestampsContext
SEO audit found several issues: the docs sitemap generated every page 6x (once per language) without hreflang alternates, the sim sitemap was missing public pages and using
new Date()as lastModified on static content (which trains Google to distrust the signal), and robots.txt had 20+ identical bot-specific rules that added noise with no effect.Changes
Sim sitemap (
apps/sim/app/sitemap.ts)/blog/authors/[id]) with lastModified derived from each author's latest post/academy,/academy/[courseSlug])changeFrequencyandpriorityfields (confirmed ignored by Google)Sim robots (
apps/sim/app/robots.ts)*wildcard/form/and/credential-account/to disallow list/blog/sitemap-images.xml)hostdirectiveDocs sitemap (
apps/docs/app/sitemap.ts— new, replacesapps/docs/app/sitemap.xml/route.ts)MetadataRoute.Sitemapconventionsource.getLanguages()from Fumadocs to deduplicate pages by slugalternates.languageswithx-defaultfor all 6 localeslastModified(no accurate source available without git plugin — absent is better than inaccurate)Docs robots (
apps/docs/app/robots.txt/route.ts)User-agent: *Type of Change
Testing
canonicalfield uses absolute URLs (enforced by Zodz.string().url())authorsfield is always populated (throws if empty)slug, modelhref, and courseslugvalues match their routesChecklist