Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
Add parallelized restore capability to ghe-restore-storage #635
Conversation
|
Sorry for the delay on this. I think it looks good to me Have you tried this out with a large amount of data, to get an idea of the improvement? Given this is just rsync'ing data from |
|
Was originally bottlenecked by my home connection speed for effectively testing this, but wound up spinning up an EC2 instance with a 5Gbit NIC to remove that problem :) With that said, disk performance on my backup host will still likely be a bottleneck as I'm seeing transfer and disk speeds plummet the more I use my test instance. Added 6GB worth of 1GB files to random locations for the storage restore into the
Time without parallelism (~9 min):
Time with parallelism (3 min 40 sec):
I'll note that I burned a fair bit of time on this getting I've also verified that restores to an HA environment still function (with parallelism enabled or disabled) and are otherwise unaffected by these updates |
|
@maclarel I think you can just install |
This introduces a parallelized restore of
storagedata, with a number of rsync threads equal to the number of storage nodes. This is the same logic used forghe-restore-repositories, simply ported over toghe-restore-storage.For customers that are heavy users of LFS in a clustered environment, this can have significant performance improvements, with a reduction in run time equivalent to the number of nodes. Specifically,
rsynconly utilizes a single thread ofsshd, so when high transfer speeds are possible it is likely thatsshdwill become CPU bound resulting in limited transfer speed.For example, restoring ~5TB of data across 5 storage nodes would complete in approximately 16 hours assuming a transfer speed of 100MB/s (roughly where we see
sshdbecome CPU bound) as the restores would be run sequentially. Assuming sufficient bandwidth for transfers at 500MB/s (achievable on a 10Gbit connection) this could reduce the overall time to approximately 3 hours as all 5rsyncinvocations would be run simultaneously and would utilize 1 thread per server effectively quintupling performance.Verbose log confirms that all 3 are being kicked off at the same time, which aligns what what is seen for
ghe-restore-repositoriesbehaviour: