Skip to content

gh-150821: Skip URL parsing in mimetypes.guess_type() for file paths#150828

Open
gaborbernat wants to merge 1 commit into
python:mainfrom
gaborbernat:opt/mimetypes-skip-urlparse
Open

gh-150821: Skip URL parsing in mimetypes.guess_type() for file paths#150828
gaborbernat wants to merge 1 commit into
python:mainfrom
gaborbernat:opt/mimetypes-skip-urlparse

Conversation

@gaborbernat
Copy link
Copy Markdown
Contributor

@gaborbernat gaborbernat commented Jun 2, 2026

mimetypes.guess_type() accepts either a URL or a filesystem path, so it parses its argument as a URL with urllib.parse.urlparse() before looking at the extension. The common argument is a plain file path, which has no URL scheme to find, so the parse — and the urllib.parse import it triggers — is spent on nothing. Guessing content types from file names is everywhere: static-file servers, upload handlers, archive and build tools deciding how to treat each file as they walk a tree of thousands.

A URL scheme requires a :, so a path without one cannot be a URL. This detects that case and goes straight to extension lookup, skipping urlparse() and its lazy import. Real URLs, and the rare path that contains a :, still take the full parsing path, and results are unchanged for both.

Guessing types for 15 real file names sampled from the top-1000 corpus improves from 23.4 µs to 11.0 µs, 112% faster.

Benchmark base patched
guess_type x15 file paths 23.4 µs 11.0 µs: 112% faster
Benchmark (pyperf)

Run base vs patched by swapping Lib/mimetypes.py on the same interpreter. The names are real file names sampled from the top-1000 corpus.

import mimetypes, pyperf
mimetypes.init()

names = ["webhook_list.py", "tox.ini", "api_management_delete_policy.py",
    ".env.sample.entra-id", "alerts_get_by_id.py", "ai_prompt_workflow.md",
    "functions.py", "sample_connections.py", "certificate_delete.py",
    "_ai_agents_instrumentor.py", ".flake8", "agent_trace_configurator.py",
    "test_ws_invoke.py", "README.md", "setup.cfg"]

runner = pyperf.Runner()
runner.bench_func("guess_type x15 file paths",
                  lambda: [mimetypes.guess_type(n) for n in names])

Resolves #150821.

…paths

guess_type() parsed every argument as a URL before checking the extension,
even for plain file paths that have no scheme. Detect the no-scheme case and
go straight to extension lookup, avoiding urlparse() and its lazy import. Real
URLs keep the full parsing path; results are unchanged.
@gaborbernat gaborbernat force-pushed the opt/mimetypes-skip-urlparse branch from d35cf04 to 175764e Compare June 2, 2026 23:45
Copy link
Copy Markdown
Member

@sobolevn sobolevn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Comment thread Lib/mimetypes.py
# A URL scheme requires a ':'; a plain file path (the common case) has
# none, so skip the relatively expensive urlparse() for it.
if isinstance(url, str) and ':' not in url:
return self.guess_file_type(url, strict=strict)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return self.guess_file_type(url, strict=strict)
return self.guess_file_type(url, strict=strict)

style nit :)

Comment thread Lib/mimetypes.py
scheme = p.scheme
url = p.path
else:
return self.guess_file_type(url, strict=strict)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: can this branch still happen now? do we have tests for this case?

Comment thread Lib/mimetypes.py
Comment on lines +129 to +130
# A URL scheme requires a ':'; a plain file path (the common case) has
# none, so skip the relatively expensive urlparse() for it.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't strictly true, : is accepted on Linux, e.g.:

$ touch fi:le
$ realpath fi\:le 
/tmp/fi:le

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Speed up mimetypes.guess_type() for plain file paths

3 participants