Skip to content

Latest commit

 

History

History
381 lines (314 loc) · 21.5 KB

File metadata and controls

381 lines (314 loc) · 21.5 KB
title Git Rev News Edition 95 (January 31st, 2023)
layout default
date 2023-01-31 12:06:51 +0100
author chriscool
categories
news
navbar false

Git Rev News: Edition 95 (January 31st, 2023)

Welcome to the 95th edition of Git Rev News, a digest of all things Git. For our goals, the archives, the way we work, and how to contribute or to subscribe, see the Git Rev News page on git.github.io.

This edition covers what happened during the months of December 2022 and January 2023.

Discussions

Support

  • Question: How to execute git-gc correctly on the Git server?

    ZheNing Hu asked about how he could run git gc correctly on his own Git server. He seemed to be afraid by the git gc documentation saying that there is a risk of failures and repository corruption when the command is run concurrently with other Git processes.

    He said that he read about git gc --cruft which could overcome these issues, but that he was still using Git v2.35 on his server while --cruft was introduced in v2.38.

    He also wondered if there was a need for git gc to set a repository level lock blocking most or all other Git operations, and what these operations -- especially git clone and git push -- should do or report when hitting this lock.

    Ævar Arnfjörð Bjarmason replied that running git gc on a "live" repo was always racy, but the odds of corrupting the repo were becoming very small when the value of the gc.pruneExpire config option was increased. He said that the default setting for this option, 2 weeks, was "more than enough for even the most paranoid user".

    About --cruft, Ævar thought that its purpose was not only to avoid possible repo corruption, but also to allow more aggressive gc (garbage collection).

    He also wondered if this question was about large hosting sites like GitHub and GitLab, where git gc is run on live repos, and suggested not to worry in this case, but to take backups.

    Jeff King, alias Peff, replied to Ævar saying he was "a bit less optimistic" about the corruption risk decreasing when gc.pruneExpire was increased because there was no atomic view of the ref namespace. So renaming a branch for example was risky because it could be seen as removing a branch and adding a different one by any concurrent process. Such a process could be another push, not just a gc.

    Peff also said that using --cruft was not so much about avoiding corruption, but about keeping cruft objects out of the main pack to reduce the cost of lookups and bitmaps, and about avoiding to explode a lot of old objects into loose objects, which could be very bad for performance.

    Ævar replied to Peff discussing further when corruption was likely or not to happen, which issues --cruft could help with, and a patch he sent in the past to reduce possible corruption. He also suggested running git gc on the least busy hours of the day.

    Later Taylor Blau replied to Ævar and Peff discussing --cruft in the context of single-pack bitmaps or multi-pack (MIDX) bitmaps, and also in the context of GitHub.

    In the meantime, Michal Suchánek replied to Ævar's first email asking what the 2 week default expiration time applied to. He also said that he got corrupted repos with less than 100 users "and some scripting" which went away when gc was disabled.

    Peff replied to Michal, saying that the expiration time applied to the mtime on the object file (or the pack containing it), and confirmed that it was "far from a complete race-free solution".

    ZheNing also replied to Michal saying that he preferred "no error at all" to a "small probability of error".

    Michal replied to Peff listing some workflows that are more likely to lead to a corrupt repo, like deleting branches but pushing other branches that are variants of these branches, and different people pushing files from the same external source.

    Peff confirmed that these workflows were indeed risky, and detailed a bit further how the race conditions can happen.

    ZheNing then replied to Peff asking if there was "an easy and poor performance" way like a lock on a repository to avoid for example concurrent push and gc processes.

    Ævar replied that there was no such way but that we should have one. He explained that it could perhaps be done using hooks, like 'pre-receive' and 'post-receive', when we were sure that all relevant operations were going through these hooks. (For example no local branch deletion should be possible.)

    ZheNing and Michal discussed a bit further the details related to how a repo corruption can happen with concurrent push and gc processes, and how that could possibly be avoided.

Developer Spotlight: Teng Long

  • Who are you and what do you do?

    My work is related to R&D efficiency tools development at Alibaba Cloud. Our team have currently built a code hosting service as codeup.aliyun.com which provides free and high-quality code services for Chinese developers on the public cloud. In addition, I used to be a Gerrit contributor, because I wrote Java for nearly 10 years, and this process made me almost forget the C language, LOL.

    For the contributions of Git community, apart from me, Jiang Xin (the Git localization coordinator), ZheNing Hu, and Chen BoJun are also in the team.

  • What would you name your most important contribution to Git?

    First of all, I know Git for some years, but I'm new in the community, because Git's technical depth is obvious which involves algorithms, operating systems, testing techniques, etc. Also, Git has many subcommands, which makes the implementation of Git itself involve many aspects, and I think it is difficult for a new contributor to understand everything, but long-term participation may make you an expert in one aspect of Git. Sadly, my time devoted to the Git community is actually limited.

    I contributed a feature last year to allow the git ls-tree subcommand to support the --format option which let you print out the result as you want, this is helpful for some automated tools or scripted work I think. If you want to know about it further, a better way is to read the blog by Taylor Blau.

  • What are you doing on the Git project these days, and why?

    I've been following the evolution of the bundle-uri feature recently, I think the idea of this feature is great and attractive. If used properly, it can not only improve the speed of code download in some scenarios, but also reduce the load on the server.

    I'm also reading about algorithms related code (like bitmap, multi-pack bitmap, bloom-filter), as I want to know some details about the combination of Git and algorithms. I think it's interesting.

  • If you could get a team of expert developers to work full time on something in Git for a full year, what would it be?

    We all know that it can be a pain in terms of resource load and cost to provide large-scale Git services. I hope to be able to solve the problem with Git's storage and computing coupling to let Git be better to integrate with cloud-native architecture. Like, should it be possible to store the refs, loose objects and packs on a Distributed Database?

    I think this is one of the future development direction of the Git architecture, starting from lower cost and cloud friendliness. If you want to do these tasks based on Git, you may need to make the internal related implementations more adaptable, which requires a lot of professional work I think.

  • If you could remove something from Git without worrying about backwards compatibility, what would it be?

    Maybe introduce a new option --branches in git push to replace --all. Option --all means to push all branches, --tags means to push all tags, but many people misunderstand it (at least those around me), because they think --all means to push all the branches and tags together. In fact, I made an RFC patch before, hoping to support the --branches parameter in the first step, and I'll consider following up with this patch.

  • What is your favorite Git-related tool/library, outside of Git itself?

    I prefer git-repo which supports doing code reviews or pull requests on the client, just like using a native Git subcommand.

  • Do you happen to have any memorable experience w.r.t. contributing to the Git project? If yes, could you share it with us?

    Still memorable when my first commit was merged in, even though it was a small fix. This process made me understand that contributing to Git is completely different from other workflows, and the process and results both feel good.

  • What is your toolbox for interacting with the mailing list and for development of Git?

    First, I use https://public-inbox.org/git/?q=a%3Adyroneteng to check if there is any new mails related to me.

    Then, I've been using git format-patch to create patchsets and git send-email to post them, and git am for local reviews. I don't know if there's a better way, but it seems to be enough for me.

  • What is your advice for people who want to start Git development? Where and how should they start?

    Contributing to Git is not an easy task, after all, you are working with other excellent contributors in the community, but continuous understanding and participation may make you an expert in a certain direction.

  • If there's one tip you would like to share with other Git developers, what would it be?

    I think it would be "get used to the process of contribution slowly".

    The review process is sometimes frustrating, but most of the suggestions by reviewers are still valuable; you can learn a lot from the process, then you can better participate in the next contribution.

Releases

Other News

Various

Light reading

Git tools and sites

  • Git-Sim: Visually Simulate Git Operations In Your Own Repos. Run a one-liner git-sim command in the terminal, for example git-sim reset HEAD^ or git-sim merge dev, to generate a custom Git command visualization (.jpg, .mp4) from your repository. Written in Python, available as package on PyPI.
  • heatwave is a tool to visualize your Git commits with a heat map in the terminal, similar to how GitHub's heat map looks. Written in Python, also available as package on PyPI.
  • git-stats is a similar tool to visualize local git statistics, including GitHub-like contributions calendars. Written in JavaScript + HTML, available as npm.js package.
    • Note that Git-Stats, also known as GitStats.me, is an unrelated open-source GitHub contribution analyzer as a web service, which was mentioned in Git Rev News Edition #63.
  • scmrepo by Iterative is a SCM wrapper and fsspec filesystem for Git for use in DVC. Works with multiple backends: pygit2 (libgit2), Dulwich, and GitPython.
  • gptcommit is a git prepare-commit-msg hook for authoring commit messages with the GPT-3 language model. Written in Rust.
    Note: you need to ensure you have sufficient credits in your OpenAI account to use it.
  • There are a few software forges working on implementing ForgeFed and/or ActivityPub federation. ForgeFed (formerly GitPub) is a federation protocol for forge services, first mentioned in Git Rev News Edition #69 in 2020.
    • Vervis is a project hosting and management application, with a focus on software projects and decentralization. Self-hosted on the https://vervis.peers.community instance. Supports Git and Darcs, and ForgeFed/ActivityPub federation. It is currently very much a work in progress. Written in Haskell.
    • ForgeFlux is API-space software forge federation with ForgeFed for Gitea, Sourcehut, GitLab, and GitHub. Repositories are (also) hosted on GitHub. Note: the project homepage on https://forgeflux.org/ seems to be down at the time of writing.
    • Forgefriends is a self-hosted forge federation project, which purpose is to allow every Free Software developer to use their favorite forge to contribute to software projects hosted on other forges. Forgefriends is written in Go to share code with Gitea, and synchronization is done via the W3C ActivityPub protocol. It is currently in pre-alpha stage.

Credits

This edition of Git Rev News was curated by Christian Couder <christian.couder@gmail.com>, Jakub Narębski <jnareb@gmail.com>, Markus Jansen <mja@jansen-preisler.de> and Kaartic Sivaraam <kaartic.sivaraam@gmail.com> with help from Teng Long.