A Study of the Categories used in Papers with Code

An increasing number of developers share Research Software online to support scientific investigations. To improve software findability, the scientific community has developed domain-specific taxonomies, yet their adoption remains unclear. This paper evaluates a set of software categories introducing a comparative framework with state-of-the-art text similarity techniques (TF-IDF, Sentence-BERT, CLIP). Using Papers with Code as a case study, we assess the level of overlap between different software categories defined in the platform, based on the methods descriptions contained in them. Our results show significant category overlap, which may limit the effectiveness of classification algorithms. While community-defined categories provide a useful foundation, they may require refinement, such as subcategories or clearer definitions, to better capture interdisciplinary methods and improve classification accuracy.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data/pwc		data/pwc
notebooks		notebooks
plots/rq1		plots/rq1
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Study of the Categories used in Papers with Code

About

Uh oh!

Releases 2

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

A Study of the Categories used in Papers with Code

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages