Add discover valid sitemaps utility (port from JS)

## Summary

Port the `discoverValidSitemaps()` utility from Crawlee JS to Python.

**JS source:** `packages/utils/src/internals/sitemap.ts` — [#3392](https://github.com/apify/crawlee/pull/3392)

## How it works in JS

```typescript
async function* discoverValidSitemaps(
    urls: string[],
    options?: { proxyUrl?: string; httpClient?: BaseHttpClient }
): AsyncIterable<string>
```

1. **Group input URLs by hostname**
2. **For each domain**, discover sitemaps from (in order):
   - `Sitemap:` entries in robots.txt
   - Input URLs that match `/sitemap\.(xml|txt)(\.gz)?$/i`
   - HEAD-request probing of `/sitemap.xml`, `/sitemap.txt`, `/sitemap_index.xml` (fallback)
3. **Deduplicate** and **process domains concurrently**

Returns an async iterable yielding sitemap URLs as discovered.

## What Python already has

- `Sitemap.try_common_names()` — probes `/sitemap.xml` and `/sitemap.txt` for a single URL (missing `/sitemap_index.xml`)
- `RobotsTxtFile.find()` + `get_sitemaps()` — fetches and extracts `Sitemap:` entries from robots.txt

**What's missing:** the orchestrating function that combines these steps, groups by hostname, validates via HEAD requests, detects direct sitemap URLs from input, and processes domains concurrently.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add discover valid sitemaps utility (port from JS) #1740

Summary

How it works in JS

What Python already has

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Add discover valid sitemaps utility (port from JS) #1740

Description

Summary

How it works in JS

What Python already has

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions