Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[8.x] PGO: update the existing benchmarks workflow to enable PGO builds (backport #13884) #14245

Merged
merged 2 commits into from
Oct 11, 2024

Conversation

mergify[bot]
Copy link
Contributor

@mergify mergify bot commented Oct 3, 2024

Motivation/summary

This PR implements changes outlined in #13859. It updates the existing benchmarks workflow to run standalone APM Server instance that produces a relevant CPU profile for PGO, then it copies, uploads and injects the obtained CPU profile into a PR, see example.

Benchmarks

The existing benchmarks results turned to be too unreliable to base PGO on. Because of the underlying dependency on ElasticSearch the difference in the throughput results could go above 10% from a workflow to workflow. The table below provides a view with the existing benchmarks results sample.

image

This all renders incremental PGO performance gains hard to observe and measure. Therefore, in this PR a new benchmark mode is introduced, which swaps ElasticSearch with a stubbed API http server (Moxy). Thus allowing us to better isolate and elevate APM Server performance component inside the benchmarks. The table below provides a view with the new isolated benchmarks results sample.

image

Using the benchmarks result sample data we can clearly observe that the results deviation for the new benchmark mode is in an order of magnitude lower in comparison to the existing ES based benchmarks. And now PGO performance improvements could be reliably observed.

The standalone APM Server benchmarks mode consists of running 3 separate EC2 instances in a VPC for apmbench, apm-server and moxy. Existing benchmark_executor and standalone_apm_server terraform modules are reused and a similar new terraform module moxy is created.

Results

PGO enabled builds show 5% performance gain on average across the standalone APM Server benchmarks workflow.

Checklist

For functional changes, consider:

  • Is it observable through the addition of either logging or metrics?
  • Is its use being published in telemetry to enable product improvement?
  • Have system tests been added to avoid regression?

How to test these changes

To observe and validate the changes please refer to the indexed PGO benchmarks results.

Related issues

#13859


This is an automatic backport of pull request #13884 done by [Mergify](https://mergify.com).

Add a benchmark workflow mode with automation to collect, preserve, and inject CPU profiles, enabling PGO builds.

The new workflow will run on a schedule and raise a special pull request that includes the most recent representative CPU profile, which will be inserted as the `default.pgo` file into the main package and automatically used in the build pipeline. The actual schedule and the model for raising pull requests with updated profiles are subject to further revisions. This new workflow mode uses a lightweight output destination - a mock proxy (Moxy) from apm-perf to better isolate the performance component of the APM Server.

(cherry picked from commit 5af8cf4)
@mergify mergify bot requested a review from a team as a code owner October 3, 2024 02:20
@mergify mergify bot added the backport label Oct 3, 2024
@mergify mergify bot assigned 1pkg Oct 3, 2024
@1pkg
Copy link
Member

1pkg commented Oct 3, 2024

I don't think that we need to backport this changes.

@1pkg 1pkg closed this Oct 3, 2024
@1pkg 1pkg reopened this Oct 11, 2024
@mergify mergify bot merged commit 023a775 into 8.x Oct 11, 2024
18 checks passed
@mergify mergify bot deleted the mergify/bp/8.x/pr-13884 branch October 11, 2024 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant