Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add caching of downloaded actions to achieve parity with the tool cache functionality #3551

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Benjamint22
Copy link

@Benjamint22 Benjamint22 commented Nov 12, 2024

This added functionality makes the use of real time NFS tool and actions caching possible.

Actual Changes

  • add environment variable ACTIONS_RUNNER_ACTION_ARCHIVE_EXTERNAL_CACHING_ENABLED which maintains current actions caching functionality (downloads actions without caching) while allowing for external caching functionality
  • extended the functionality of actions downloads to copy to cache location if not found in cache location if the above environment variable is true
  • added workflow output to indicate that the copy took place

Authored by

Signed-off-by: Tucker Fowler <tucker_fowler1@homedepot.com>
@Benjamint22 Benjamint22 requested a review from a team as a code owner November 12, 2024 21:51
Comment on lines +834 to +839
if (hasActionArchiveCache && externalCachingEnabled)
{
executionContext.Output($"Saving archive file to cache at '{cacheArchiveFile}'");
Directory.CreateDirectory(Path.GetDirectoryName(cacheArchiveFile));
File.Copy(archiveFile, cacheArchiveFile, true);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By adding the bit of code in the PR we are able to step into the middle of the actions download functionality and save the downloaded archive to the cache folder without any change to the rest of the logic flow. On subsequent runs, that archive is found and used by all runners as intended.

Further, we also attempted to mount the NFS as the temp folder, just as a “let’s try it and see what happens” kind of thing, knowing that we would be persisting far too much data. That also did not work.

According to our testing, a mounted NFS archive directory, regardless of “tool” or “action” cache HAS to exist in a directory outside of any of the directories created by the runner. This means that in order for a mounted persistent cache to work, the actions caching process HAS to function in the way that the tool cache functions e.g., persisting the downloaded archive to the provided cache directory.

Copy link

@einsteinsbrd einsteinsbrd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requested by Account Manager -
We are building out a NOVEL approach to actions and tool caching in ARCv2 that requires a small change to the runner's action caching process.

Right now the runner has some mechanisms for caching tools and actions to decrease run time in workflow runs in github hosted runners. However the caches currently depend on pre deployment cache population using https://github.com/actions/action-versions scripts in a multi-stage docker build.

During a fast track engagement with Ken Muse, we implemented those caches in our self hosted runners, however we use an inordinate number of tools and actions across our organization and the time to build the docker image with those caches is measured in hours and results in a massive docker image.

We have found a way to cache in real time using a many to one gcp filestore NFS mounted in a Persistent Volume in our gke cluster deployment of actions scale set. The Persistent Volume claim can then be mounted to the runner container's filesystem.

The tool cache, which is a standing github solution, uses an environment variable to point to a custom directory and store downloaded tools in said directory, the flow is as follows- check cache for tool, if found use it, if not found download it and persist it in the tool cache directory and then use it, repeat. This makes it possible to mount our NFS to an arbitrary directory and have the tools that are downloaded persisted in that arbitrary directory for further use.

The actions cache mechanism is different however. The directory that the actions cache environment variable points to is only used to check for previously downloaded actions. The cache directory is in no way used to store actions that are downloaded so that they can be persisted. The logic flow is as follows - Check cache directory for action, if found use it, if not found download it -> move it to _temp -> expand it to a random GUID named file -> use decompressed action, end. This makes it impossible to persist actions in the same way that the tool cache persists tools. E.G. the actions cache is never populated with downloaded actions when the cache variable is provided.

This pull request implements a logic gate dependent on an envirment variable being set to true, where if remote caching is enabled, we copy the downloaded SHA.tar.gz/zip to the actions cache location so that all runners across the ARC ecosystem now have access to the cached actions/tools in real time after first download.

This solution has been tested out using the contribution docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants