| title | Git Rev News Edition 95 (January 31st, 2023) | |
|---|---|---|
| layout | default | |
| date | 2023-01-31 12:06:51 +0100 | |
| author | chriscool | |
| categories |
|
|
| navbar | false |
Welcome to the 95th edition of Git Rev News, a digest of all things Git. For our goals, the archives, the way we work, and how to contribute or to subscribe, see the Git Rev News page on git.github.io.
This edition covers what happened during the months of December 2022 and January 2023.
-
Question: How to execute git-gc correctly on the Git server?
ZheNing Hu asked about how he could run
git gccorrectly on his own Git server. He seemed to be afraid by thegit gcdocumentation saying that there is a risk of failures and repository corruption when the command is run concurrently with other Git processes.He said that he read about
git gc --cruftwhich could overcome these issues, but that he was still using Git v2.35 on his server while--cruftwas introduced in v2.38.He also wondered if there was a need for
git gcto set a repository level lock blocking most or all other Git operations, and what these operations -- especiallygit cloneandgit push-- should do or report when hitting this lock.Ævar Arnfjörð Bjarmason replied that running
git gcon a "live" repo was always racy, but the odds of corrupting the repo were becoming very small when the value of thegc.pruneExpireconfig option was increased. He said that the default setting for this option, 2 weeks, was "more than enough for even the most paranoid user".About
--cruft, Ævar thought that its purpose was not only to avoid possible repo corruption, but also to allow more aggressive gc (garbage collection).He also wondered if this question was about large hosting sites like GitHub and GitLab, where
git gcis run on live repos, and suggested not to worry in this case, but to take backups.Jeff King, alias Peff, replied to Ævar saying he was "a bit less optimistic" about the corruption risk decreasing when
gc.pruneExpirewas increased because there was no atomic view of the ref namespace. So renaming a branch for example was risky because it could be seen as removing a branch and adding a different one by any concurrent process. Such a process could be anotherpush, not just agc.Peff also said that using
--cruftwas not so much about avoiding corruption, but about keeping cruft objects out of the main pack to reduce the cost of lookups and bitmaps, and about avoiding to explode a lot of old objects into loose objects, which could be very bad for performance.Ævar replied to Peff discussing further when corruption was likely or not to happen, which issues
--cruftcould help with, and a patch he sent in the past to reduce possible corruption. He also suggested runninggit gcon the least busy hours of the day.Later Taylor Blau replied to Ævar and Peff discussing
--cruftin the context of single-pack bitmaps or multi-pack (MIDX) bitmaps, and also in the context of GitHub.In the meantime, Michal Suchánek replied to Ævar's first email asking what the 2 week default expiration time applied to. He also said that he got corrupted repos with less than 100 users "and some scripting" which went away when
gcwas disabled.Peff replied to Michal, saying that the expiration time applied to the
mtimeon the object file (or the pack containing it), and confirmed that it was "far from a complete race-free solution".ZheNing also replied to Michal saying that he preferred "no error at all" to a "small probability of error".
Michal replied to Peff listing some workflows that are more likely to lead to a corrupt repo, like deleting branches but pushing other branches that are variants of these branches, and different people pushing files from the same external source.
Peff confirmed that these workflows were indeed risky, and detailed a bit further how the race conditions can happen.
ZheNing then replied to Peff asking if there was "an easy and poor performance" way like a lock on a repository to avoid for example concurrent
pushandgcprocesses.Ævar replied that there was no such way but that we should have one. He explained that it could perhaps be done using hooks, like 'pre-receive' and 'post-receive', when we were sure that all relevant operations were going through these hooks. (For example no local branch deletion should be possible.)
ZheNing and Michal discussed a bit further the details related to how a repo corruption can happen with concurrent
pushandgcprocesses, and how that could possibly be avoided.
-
Who are you and what do you do?
My work is related to R&D efficiency tools development at Alibaba Cloud. Our team have currently built a code hosting service as codeup.aliyun.com which provides free and high-quality code services for Chinese developers on the public cloud. In addition, I used to be a Gerrit contributor, because I wrote Java for nearly 10 years, and this process made me almost forget the C language, LOL.
For the contributions of Git community, apart from me, Jiang Xin (the Git localization coordinator), ZheNing Hu, and Chen BoJun are also in the team.
-
What would you name your most important contribution to Git?
First of all, I know Git for some years, but I'm new in the community, because Git's technical depth is obvious which involves algorithms, operating systems, testing techniques, etc. Also, Git has many subcommands, which makes the implementation of Git itself involve many aspects, and I think it is difficult for a new contributor to understand everything, but long-term participation may make you an expert in one aspect of Git. Sadly, my time devoted to the Git community is actually limited.
I contributed a feature last year to allow the
git ls-treesubcommand to support the--formatoption which let you print out the result as you want, this is helpful for some automated tools or scripted work I think. If you want to know about it further, a better way is to read the blog by Taylor Blau. -
What are you doing on the Git project these days, and why?
I've been following the evolution of the
bundle-urifeature recently, I think the idea of this feature is great and attractive. If used properly, it can not only improve the speed of code download in some scenarios, but also reduce the load on the server.I'm also reading about algorithms related code (like bitmap, multi-pack bitmap, bloom-filter), as I want to know some details about the combination of Git and algorithms. I think it's interesting.
-
If you could get a team of expert developers to work full time on something in Git for a full year, what would it be?
We all know that it can be a pain in terms of resource load and cost to provide large-scale Git services. I hope to be able to solve the problem with Git's storage and computing coupling to let Git be better to integrate with cloud-native architecture. Like, should it be possible to store the refs, loose objects and packs on a Distributed Database?
I think this is one of the future development direction of the Git architecture, starting from lower cost and cloud friendliness. If you want to do these tasks based on Git, you may need to make the internal related implementations more adaptable, which requires a lot of professional work I think.
-
If you could remove something from Git without worrying about backwards compatibility, what would it be?
Maybe introduce a new option
--branchesingit pushto replace--all. Option--allmeans to push all branches,--tagsmeans to push all tags, but many people misunderstand it (at least those around me), because they think--allmeans to pushallthe branches and tags together. In fact, I made an RFC patch before, hoping to support the--branchesparameter in the first step, and I'll consider following up with this patch. -
What is your favorite Git-related tool/library, outside of Git itself?
I prefer git-repo which supports doing code reviews or pull requests on the client, just like using a native Git subcommand.
-
Do you happen to have any memorable experience w.r.t. contributing to the Git project? If yes, could you share it with us?
Still memorable when my first commit was merged in, even though it was a small fix. This process made me understand that contributing to Git is completely different from other workflows, and the process and results both feel good.
-
What is your toolbox for interacting with the mailing list and for development of Git?
First, I use https://public-inbox.org/git/?q=a%3Adyroneteng to check if there is any new mails related to me.
Then, I've been using
git format-patchto create patchsets andgit send-emailto post them, andgit amfor local reviews. I don't know if there's a better way, but it seems to be enough for me. -
What is your advice for people who want to start Git development? Where and how should they start?
Contributing to Git is not an easy task, after all, you are working with other excellent contributors in the community, but continuous understanding and participation may make you an expert in a certain direction.
-
If there's one tip you would like to share with other Git developers, what would it be?
I think it would be "get used to the process of contribution slowly".
The review process is sometimes frustrating, but most of the suggestions by reviewers are still valuable; you can learn a lot from the process, then you can better participate in the next contribution.
- Git 2.39.1 and others
- Git for Windows 2.39.1(1)
- libgit2 1.5.1
- GitLab 15.8 15.7.5, 15.6.6, and 15.5.9, 15.7.3, 15.7.2, 15.7.1
- Bitbucket Server 8.7
- GitHub Enterprise 3.7.4, 3.6.7, 3.5.11, 3.4.14, 3.3.19, 3.7.3, 3.6.6, 3.5.10, 3.4.13, 3.3.18
- GitKraken 9.0.1
- GitHub Desktop 3.1.5, 3.1.4
Various
- Git security vulnerabilities announced
that affect versions 2.39 and older. Fixes were authored by engineers from
the GitLab Security Research Team,
as well as GitHub Engineers, and members of the git-security mailing list.
- Two of three vulnerabilities were found as part of an audit of the Git codebase conducted by X41. This audit was sponsored by the Open Source Technology Improvement Fund (OSTIF). X41 have also published information about this Security Audit of Git.
- Git security audit: Inside the hunt for - and discovery of - CVEs by Joern Schneeweisz on GitLab Blog.
- This Week In Security: Git Deep Dive, Mailchimp, And SPF by Jonathan Bennett on Hackaday, and A security audit of Git brief on LWN.net.
- GitHub Sponsors will stop supporting PayPal, starting on February 23, 2023.
- GitHub is sunsetting Subversion support. On January 8, 2024, GitHub will remove support for Subversion.
- XetHub raises $7.5M for its Git-based data collaboration platform by Frederic Lardinois on TechCrunch.
Light reading
- Beyond Git: The other version control systems developers use by Ryan Donovan on The Overflow (stackoverflow.blog).
- Never write a commit message again (with the help of GPT-3) by Roger Zurawicki on his blog (though read the proposed commit message before accepting it, please).
- Sending a kernel patch with b4 (part 1)
by Konstantin Ryabitsev (sending patch with b4
is described in Contributor overview
section of the tool documentation).
- Introducing b4 and patch attestation was mentioned in Git Rev News Edition #61.
- The GitHub Silverware Drawer Dilemma, Or: Finding Active Repository Forks by Maya Posch on Hackaday points to projects that help to find the most active fork.
- 7 Git articles every open source practitioner should read by AmyJune Hineline from RedHat on OpenSource.com.
- Understanding Git through images by kataoka_nopeNoshishi on DEV.to.
- Querying the GitHub archive with the ClickHouse Playground by Simon Willison on Simon Willison’s TILs (Today I've Learned).
- 7 tips for improving your productivity with Git by Daniel Genezini on his "It works on my machine" blog (also on DEV.to).
- Fix that damn Git Unsafe Repository by Rick Strahl on Rick Strahl's Weblog.
- Mastering the Art of Writing Effective Git Commit Messages by Ashish Patel on DEV.to.
- 11 tips for writing a good Git commit message by AmyJune Hineline from RedHat on OpenSource.com.
- 20 Git Commands you (probably) didn't know about by Alicia Sykes on DEV.to.
- How to Checkout a Remote Git Branch
by Dave McKay on How-To Geek;
though the article misses the DWIM
git checkout <remote-branch>trick, and does not mention the newergit switch <branch>command as alternative togit checkout <branch>. - Git tutorials - understanding of rebase and merge by Joonhyeok Ahn (Joon) on DEV.to is the final part in the 4 part Git Cookbook series.
- Golang (and thus
git-lfs) is evil on shitty networks on the Somewhere Within Boredom blog (may be fixed by the time you are reading this).
Git tools and sites
- Git-Sim: Visually Simulate Git Operations In Your Own Repos.
Run a one-liner
git-simcommand in the terminal, for examplegit-sim reset HEAD^orgit-sim merge dev, to generate a custom Git command visualization (.jpg, .mp4) from your repository. Written in Python, available as package on PyPI. - heatwave is a tool to visualize your Git commits with a heat map in the terminal, similar to how GitHub's heat map looks. Written in Python, also available as package on PyPI.
- git-stats is a similar tool
to visualize local git statistics, including GitHub-like contributions calendars.
Written in JavaScript + HTML, available as npm.js package.
- Note that Git-Stats, also known as GitStats.me, is an unrelated open-source GitHub contribution analyzer as a web service, which was mentioned in Git Rev News Edition #63.
- scmrepo by Iterative
is a SCM wrapper and fsspec filesystem for Git for use in DVC.
Works with multiple backends: pygit2 (libgit2), Dulwich, and GitPython.
- DVC (Data Version Control) was first mentioned in Git Rev News Edition #42.
- gptcommit is a
git prepare-commit-msg hook
for authoring commit messages with the GPT-3 language model. Written in Rust.
Note: you need to ensure you have sufficient credits in your OpenAI account to use it. - There are a few software forges working on implementing ForgeFed and/or ActivityPub federation.
ForgeFed (formerly GitPub) is a federation protocol
for forge services, first mentioned in Git Rev News Edition #69 in 2020.
- Vervis is a project hosting and management application, with a focus on software projects and decentralization. Self-hosted on the https://vervis.peers.community instance. Supports Git and Darcs, and ForgeFed/ActivityPub federation. It is currently very much a work in progress. Written in Haskell.
- ForgeFlux is API-space software forge federation with ForgeFed for Gitea, Sourcehut, GitLab, and GitHub. Repositories are (also) hosted on GitHub. Note: the project homepage on https://forgeflux.org/ seems to be down at the time of writing.
- Forgefriends is a self-hosted forge federation project, which purpose is to allow every Free Software developer to use their favorite forge to contribute to software projects hosted on other forges. Forgefriends is written in Go to share code with Gitea, and synchronization is done via the W3C ActivityPub protocol. It is currently in pre-alpha stage.
This edition of Git Rev News was curated by Christian Couder <christian.couder@gmail.com>, Jakub Narębski <jnareb@gmail.com>, Markus Jansen <mja@jansen-preisler.de> and Kaartic Sivaraam <kaartic.sivaraam@gmail.com> with help from Teng Long.