Skip to content
Open
Prev Previous commit
Separated unreachable objects by type in the output.
  • Loading branch information
Scott Arbeit committed Apr 16, 2025
commit 99ec1bfa62e68c023026ea02b3ad2c37f277061c
71 changes: 38 additions & 33 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,66 +89,71 @@ Is your Git repository bursting at the seams?

## Usage

By default, `git-sizer` outputs its results in tabular format. For example, let's use it to analyze [the Linux repository](https://github.com/torvalds/linux), using the `--verbose` option so that all statistics are output:
By default, `git-sizer` outputs its results in tabular format. For example, let's use it to analyze [the Linux repository](https://github.com/torvalds/linux) (as of April, 2025), using the `--verbose` option so that all statistics are output:

```
$ git-sizer --verbose
Processing blobs: 1652370
Processing trees: 3396199
Processing commits: 722647
Matching commits to trees: 722647
Processing annotated tags: 534
Processing references: 539
Processing blobs: 2928490
Processing trees: 6510174
Processing commits: 1351500
Matching commits to trees: 1351500
Processing annotated tags: 877
Processing references: 883

| Name | Value | Level of concern |
| ---------------------------- | --------- | ------------------------------ |
| Overall repository size | | |
| Repository statistics | | |
| * Commits | | |
| * Count | 723 k | * |
| * Total size | 525 MiB | ** |
| * Count | 1.35 M | ** |
| * Total size | 1.11 GiB | **** |
| * Trees | | |
| * Count | 3.40 M | ** |
| * Total size | 9.00 GiB | **** |
| * Total tree entries | 264 M | ***** |
| * Count | 6.51 M | **** |
| * Total size | 19.0 GiB | ********** |
| * Total tree entries | 547 M | ********** |
| * Blobs | | |
| * Count | 1.65 M | * |
| * Total size | 55.8 GiB | ***** |
| * Count | 2.93 M | * |
| * Uncompressed total size | 115 GiB | ************ |
| * On-disk size | | |
| * Compressed total size | 5.68 GiB | ****** |
| * Annotated tags | | |
| * Count | 534 | |
| * Count | 877 | |
| * References | | |
| * Count | 539 | |
| * Count | 883 | |
| * Branches | 1 | |
| * Tags | 880 | |
| * Remote-tracking refs | 2 | |
| | | |
| Biggest objects | | |
| * Commits | | |
| * Maximum size [1] | 72.7 KiB | * |
| * Maximum parents [2] | 66 | ****** |
| * Trees | | |
| * Maximum entries [3] | 1.68 k | * |
| * Maximum entries [3] | 2.60 k | ** |
| * Blobs | | |
| * Maximum size [4] | 13.5 MiB | * |
| * Maximum size [4] | 22.8 MiB | ** |
| | | |
| History structure | | |
| * Maximum history depth | 136 k | |
| * Maximum history depth | 198 k | |
| * Maximum tag depth [5] | 1 | |
| | | |
| Biggest checkouts | | |
| * Number of directories [6] | 4.38 k | ** |
| * Maximum path depth [7] | 13 | * |
| * Maximum path length [8] | 134 B | * |
| * Number of files [9] | 62.3 k | * |
| * Total size of files [9] | 747 MiB | |
| * Number of symlinks [10] | 40 | |
| * Number of directories [6] | 5.89 k | ** |
| * Maximum path depth [6] | 14 | * |
| * Maximum path length [7] | 134 B | * |
| * Number of files [8] | 88.7 k | * |
| * Total size of files [9] | 1.41 GiB | * |
| * Number of symlinks [6] | 78 | |
| * Number of submodules | 0 | |

[1] 91cc53b0c78596a73fa708cceb7313e7168bb146
[2] 2cde51fbd0f310c8a2c5f977e665c0ac3945b46d
[3] 4f86eed5893207aca2c2da86b35b38f2e1ec1fc8 (refs/heads/master:arch/arm/boot/dts)
[4] a02b6794337286bc12c907c33d5d75537c240bd0 (refs/heads/master:drivers/gpu/drm/amd/include/asic_reg/vega10/NBIO/nbio_6_1_sh_mask.h)
[3] ac1d84c335bcbd5fc5d82b8e985d8a9cc4c67d79 (6a1d798feb65d2a67e6e2cafb0b0e4f430603226:arch/arm/boot/dts)
[4] c20bf730dc553e5ae44ad9e769b1f8dface9fa9e (refs/heads/master:drivers/gpu/drm/amd/include/asic_reg/dcn/dcn_3_2_0_sh_mask.h)
[5] 5dc01c595e6c6ec9ccda4f6f69c131c0dd945f8c (refs/tags/v2.6.11)
[6] 1459754b9d9acc2ffac8525bed6691e15913c6e2 (589b754df3f37ca0a1f96fccde7f91c59266f38a^{tree})
[7] 78a269635e76ed927e17d7883f2d90313570fdbc (dae09011115133666e47c35673c0564b0a702db7^{tree})
[8] ce5f2e31d3bdc1186041fdfd27a5ac96e728f2c5 (refs/heads/master^{tree})
[9] 532bdadc08402b7a72a4b45a2e02e5c710b7d626 (e9ef1fe312b533592e39cddc1327463c30b0ed8d^{tree})
[10] f29a5ea76884ac37e1197bef1941f62fda3f7b99 (f5308d1b83eba20e69df5e0926ba7257c8dd9074^{tree})
[6] 549fc717f82345cf115dfa586ce076a8d1f296a6 (refs/heads/master^{tree})
[7] b0da5ce619daec8138cf92dfcf00e7a51ce856a9 (d8763340d2cb6262fb86424315a1f92cabc0e23c^{tree})
[8] fd94fec4e9c4e08df8e919e57fcc974c52c88c3c (3491aa04787f4d7e00da98d94b1b10001c398b5a^{tree})
[9] 80e16948c5baba02ea2eeda7aa4b2478b68bbaf0 (524c03585fda36584cc7ada49a1827666d37eb4e^{tree})
```

The output is a table showing the thing that was measured, its numerical value, and a rough indication of which values might be a cause for concern. In all cases, only objects that are reachable from references are included (i.e., not unreachable objects, nor objects that are reachable only from the reflogs).
Expand Down
15 changes: 12 additions & 3 deletions git-sizer.go
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ const usage = `usage: git-sizer [OPTS] [ROOT...]
gitconfig: 'sizer.jsonVersion'.
--[no-]progress report (don't report) progress to stderr. Can
be set via gitconfig: 'sizer.progress'.
--include-unreachable include unreachable objects
--include-unreachable include unreachable objects in the analysis
--version only report the git-sizer version number

Object selection:
Expand Down Expand Up @@ -353,8 +353,15 @@ func mainImplementation(ctx context.Context, stdout, stderr io.Writer, args []st
historySize.ShowUnreachable = true
unreachableStats, err := repo.GetUnreachableStats()
if err == nil {
historySize.UnreachableObjectCount = counts.Count64(unreachableStats.Count)
historySize.UnreachableObjectSize = counts.Count64(unreachableStats.Size)
// Store per-type unreachable stats for output
historySize.UnreachableBlobsCount = counts.Count64(unreachableStats.Blobs.Count)
historySize.UnreachableBlobsSize = counts.Count64(unreachableStats.Blobs.Size)
historySize.UnreachableTreesCount = counts.Count64(unreachableStats.Trees.Count)
historySize.UnreachableTreesSize = counts.Count64(unreachableStats.Trees.Size)
historySize.UnreachableCommitsCount = counts.Count64(unreachableStats.Commits.Count)
historySize.UnreachableCommitsSize = counts.Count64(unreachableStats.Commits.Size)
historySize.UnreachableTagsCount = counts.Count64(unreachableStats.Tags.Count)
historySize.UnreachableTagsSize = counts.Count64(unreachableStats.Tags.Size)
}
}

Expand All @@ -374,6 +381,8 @@ func mainImplementation(ctx context.Context, stdout, stderr io.Writer, args []st
}
fmt.Fprintf(stdout, "%s\n", j)
} else {
// Print a blank line between progress output and the table
fmt.Fprintln(stdout)
if _, err := io.WriteString(
stdout, historySize.TableString(rg.Groups(), threshold, nameStyle),
); err != nil {
Expand Down
88 changes: 67 additions & 21 deletions git/git.go
Original file line number Diff line number Diff line change
Expand Up @@ -177,45 +177,91 @@ func (repo *Repository) GitPath(relPath string) (string, error) {
return string(bytes.TrimSpace(out)), nil
}

// UnreachableStats holds the count and size of unreachable objects.
// UnreachableStats holds the count and size of unreachable objects, broken out by type.
type UnreachableStats struct {
Count int64
Size int64
Blobs struct {
Count int64
Size int64
}
Trees struct {
Count int64
Size int64
}
Commits struct {
Count int64
Size int64
}
Tags struct {
Count int64
Size int64
}
}

// GetUnreachableStats runs 'git fsck --unreachable --no-reflogs --full'
// and returns the count and total size of unreachable objects.
// This implementation collects all OIDs from fsck output and then uses
// batch mode to efficiently retrieve their sizes.
// GetUnreachableStats runs 'git fsck --unreachable --no-reflogs'
// and returns the count and total size of unreachable objects, broken out by type.
func (repo *Repository) GetUnreachableStats() (UnreachableStats, error) {
// Run git fsck. Using CombinedOutput captures both stdout and stderr.
cmd := repo.GitCommand("fsck", "--unreachable", "--no-reflogs", "--full")
cmd := repo.GitCommand("fsck", "--unreachable", "--no-reflogs")
output, err := cmd.Output()
if err != nil {
return UnreachableStats{Count: 0, Size: 0}, fmt.Errorf(
"running 'git fsck --unreachable --no-reflogs --full': %w", err,
return UnreachableStats{}, fmt.Errorf(
"running 'git fsck --unreachable --no-reflogs': %w", err,
)
}

var oids []string
count := int64(0)
// Collect OIDs by type

oidsByType := map[string][]string{
"blob": {},
"tree": {},
"commit": {},
"tag": {},
}
countsByType := map[string]int64{
"blob": 0,
"tree": 0,
"commit": 0,
"tag": 0,
}

for _, line := range bytes.Split(output, []byte{'\n'}) {
fields := bytes.Fields(line)
// Expected line format: "unreachable <type> <oid> ..."
if len(fields) >= 3 && string(fields[0]) == "unreachable" {
count++
oid := string(fields[2])
oids = append(oids, oid)
typeStr := string(fields[1])
if _, ok := oidsByType[typeStr]; ok {
oid := string(fields[2])
oidsByType[typeStr] = append(oidsByType[typeStr], oid)
countsByType[typeStr]++
}
}
}

// Retrieve the total size using batch mode.
totalSize, err := repo.getTotalSizeFromOids(oids)
if err != nil {
return UnreachableStats{}, fmt.Errorf("failed to get sizes via batch mode: %w", err)
var stats UnreachableStats
var errBlob, errTree, errCommit, errTag error
stats.Blobs.Count = countsByType["blob"]
stats.Trees.Count = countsByType["tree"]
stats.Commits.Count = countsByType["commit"]
stats.Tags.Count = countsByType["tag"]

stats.Blobs.Size, errBlob = repo.getTotalSizeFromOids(oidsByType["blob"])
stats.Trees.Size, errTree = repo.getTotalSizeFromOids(oidsByType["tree"])
stats.Commits.Size, errCommit = repo.getTotalSizeFromOids(oidsByType["commit"])
stats.Tags.Size, errTag = repo.getTotalSizeFromOids(oidsByType["tag"])

if errBlob != nil {
return stats, fmt.Errorf("failed to get blob sizes: %w", errBlob)
}
if errTree != nil {
return stats, fmt.Errorf("failed to get tree sizes: %w", errTree)
}
if errCommit != nil {
return stats, fmt.Errorf("failed to get commit sizes: %w", errCommit)
}
if errTag != nil {
return stats, fmt.Errorf("failed to get tag sizes: %w", errTag)
}

return UnreachableStats{Count: count, Size: totalSize}, nil
return stats, nil
}

// getTotalSizeFromOids uses 'git cat-file --batch-check' to retrieve sizes for
Expand Down
38 changes: 32 additions & 6 deletions sizes/output.go
Original file line number Diff line number Diff line change
Expand Up @@ -613,12 +613,38 @@ func (s *HistorySize) contents(refGroups []RefGroup) tableContents {
if s.ShowUnreachable {
sections = append(sections, S(
"Unreachable objects",
I("unreachableObjectCount", "Count",
"The total number of unreachable objects in the repository",
nil, s.UnreachableObjectCount, metric, "", 1e7),
I("unreachableObjectSize", "Uncompressed total size",
"The total size of unreachable objects in the repository",
nil, s.UnreachableObjectSize, binary, "B", 1e9),
S("Blobs",
I("unreachableBlobsCount", "Count",
"The total number of unreachable blobs in the repository",
nil, s.UnreachableBlobsCount, metric, "", 1.5e6),
I("unreachableBlobsSize", "Uncompressed total size",
"The total size of unreachable blobs in the repository",
nil, s.UnreachableBlobsSize, binary, "B", 1e9),
),
S("Trees",
I("unreachableTreesCount", "Count",
"The total number of unreachable trees in the repository",
nil, s.UnreachableTreesCount, metric, "", 1.5e6),
I("unreachableTreesSize", "Total size",
"The total size of unreachable trees in the repository",
nil, s.UnreachableTreesSize, binary, "B", 2e9),
),
S("Commits",
I("unreachableCommitsCount", "Count",
"The total number of unreachable commits in the repository",
nil, s.UnreachableCommitsCount, metric, "", 500e3),
I("unreachableCommitsSize", "Total size",
"The total size of unreachable commits in the repository",
nil, s.UnreachableCommitsSize, binary, "B", 250e6),
),
S("Tags",
I("unreachableTagsCount", "Count",
"The total number of unreachable tags in the repository",
nil, s.UnreachableTagsCount, metric, "", 25e3),
I("unreachableTagsSize", "Total size",
"The total size of unreachable tags in the repository",
nil, s.UnreachableTagsSize, binary, "B", 250e6),
),
))
}

Expand Down
26 changes: 22 additions & 4 deletions sizes/sizes.go
Original file line number Diff line number Diff line change
Expand Up @@ -214,11 +214,29 @@ type HistorySize struct {
// The actual size of the .git directory on disk.
GitDirSize counts.Count64 `json:"git_dir_size"`

// The total number of unreachable objects in the repository.
UnreachableObjectCount counts.Count64 `json:"unreachable_object_count"`
// The total number of unreachable blobs in the repository.
UnreachableBlobsCount counts.Count64 `json:"unreachable_blobs_count"`

// The total size of unreachable objects in the repository.
UnreachableObjectSize counts.Count64 `json:"unreachable_object_size"`
// The total size of unreachable blobs in the repository.
UnreachableBlobsSize counts.Count64 `json:"unreachable_blobs_size"`

// The total number of unreachable trees in the repository.
UnreachableTreesCount counts.Count64 `json:"unreachable_trees_count"`

// The total size of unreachable trees in the repository.
UnreachableTreesSize counts.Count64 `json:"unreachable_trees_size"`

// The total number of unreachable commits in the repository.
UnreachableCommitsCount counts.Count64 `json:"unreachable_commits_count"`

// The total size of unreachable commits in the repository.
UnreachableCommitsSize counts.Count64 `json:"unreachable_commits_size"`

// The total number of unreachable tags in the repository.
UnreachableTagsCount counts.Count64 `json:"unreachable_tags_count"`

// The total size of unreachable tags in the repository.
UnreachableTagsSize counts.Count64 `json:"unreachable_tags_size"`

ShowUnreachable bool `json:"-"`
}
Expand Down