It's just a JSON file, so you can use it in any environment. Sourced from GitHub's Linguist project (defines all 69 markup languages known to GitHub). Data is updated via script and released via new package version.
pip install markup-languagesimport markup_languages
html_lang_data = markup_languages['HTML']
print(html_lang_data['extensions']) # => ['.hta', '.htm', '.html', '.html.hl', ...]Note: Most type checkers will falsely warn markup_languages is not subscriptable because they are incapable of analyzing runtime behavior (where the module is replaced w/ a dictionary for cleaner, direct access). You can safely suppress such warnings using # type: ignore.
Get language(s) from an extension:
def get_lang(file_ext):
lang_matches = [
lang for lang, data in markup_languages.items()
if file_ext in data['extensions']
]
return lang_matches[0] if len(lang_matches) == 1 else lang_matches
print(get_lang('.sss')) # => SugarSSGet language(s) from a file path:
def get_lang_from_path(filepath):
from pathlib import Path
file_ext = Path(filepath).suffix
lang_matches = [
lang for lang, data in markup_languages.items()
if file_ext in data['extensions']
]
return lang_matches[0] if len(matches) == 1 else lang_matches
print(get_lang_from_path('index.html')) # => ['Ecmarkup', 'HTML']
print(get_lang_from_path('style.css')) # => CSS
print(get_lang_from_path('script.js')) # => [] (use programming-languages pkg)Copyright © 2026 Adam Lui
🇨🇳 non-latin-locales - ISO 639-1 (2-letter) codes for non-Latin locales.
#! programming-languages - File extensions for programming languages.
🏷️ project-markers - Common project root markers.


