-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate git clone --reference[-if-able] when using --path-cache/--name-cache #625
Comments
I hope the description is complete because the discord link is behind a "registration wall". Just FYI. |
I think there's enough in there, thanks. |
@mbolivar-ampere, I'd be happy to do that. I was looking at the discord discussion to see what metrics you were interested in and got a good idea. And I can probably add more. The thing, though, is that unless one is dealing with large repositories containing GBs of objects and have many active workspaces, it'd be hard to see the the value, and in some cases the necessity, of object sharing. That's what I was trying to convey (unsuccessfully) in #695. So, in short, scale is important, both in the size of the individual repositories and in the number of workspaces that clone them. This is particularly important in CI but even when used by interactive users. Now, I'm sure that an immediately obvious question to some readers after seeing "GBs of objects" is "why do you even have that large of repositories? Do you revision large binaries in Git, which is really not intended for that purpose? Understand that Git isn't really intended to store binaries. Consider using LFS instead or truncate and rewrite old history." All those are valid questions and options, but to make any of those options happen in production is no easy task. Bottom line is this -- a tool built on top of Git should offer both of the primary mechanisms provided by Git for dealing with many clones of very large repositories -- shallow depth and object sharing. |
Engineering is always about trade-offs and numbers. Performance even more. You can almost always find a specific use case that will benefit from pretty much any optimization. The question is always "is it worth it?". In other words, is the extra code and corresponding extra maintenance cost worth the benefits? Of course this is not an exact science: we'll never know exactly how many people use big git repos with west and how big they are. But that should not stop us from looking at some examples and incomplete data. It's still better than no data at all because performance is always full of surprises. For the same reason we need some estimation of the complexity of the code changes and corresponding maintenance burden. Considering the extremely limited manpower, significantly affecting the maintenance of "mainstream" features for very few people using git "the wrong way" could be a blocker. |
I performed some tests with my draft PR. Here is explanation for two legends in the graph below: West update times were captured from our dev CI system. Each west update execution ran on clean Azure F16s_v2 VM (with ephemeral disk and with some host cache prepping included). Even with such a limited number of samples I think that current local clone approach is significantly faster than reference repo approach I used. State of cache was the same for all the executions and it was not up-to-date for all projects but quite recent anyway - I believe this is quite close what we would see in actual CI. On my laptop and with fully up-to-date cache - local clone approach is significantly faster than using reference. In case anyone got ideas how to improve reference case please let me know. |
Should this be closed now? |
In general, investigate and benchmark using .git/objects/info/alternates to point to the cache repository instead of cloning the cache repository. This may be faster when the clone has to cross a file system boundary, which is frequently the case when --path-cache and --name-cache are used in CI environments.
Reference to Discord discussion:
https://discordapp.com/channels/720317445772017664/906521547672522752/1074769891850211398
The text was updated successfully, but these errors were encountered: