-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cpufreqctl: avoid boosting when measuring pstate frequency #202
base: master
Are you sure you want to change the base?
cpufreqctl: avoid boosting when measuring pstate frequency #202
Conversation
994b02c
to
4cc0b88
Compare
4cc0b88
to
b151760
Compare
Rebased onto master, also testes functional on the latest version of Arch (GNOME 43). |
So, the main problem is the API with which the extension is interfacing with the cpufreqctl tool: # cpufreqctl --format json info current
{
"min": 2095908,
"max": 4192231,
"avg": 2916799,
"rnd": 4192206
} On my AMD Ryzen this creates no problems, but does create problems when
This API has 1 positive thing and 1 negative thing. The positive thing is that it provides 4 different ways to gather the CPU frequency with the So, the first thing would be to redesign the API to something like this: # cpufreqctl --format json info current min
1863003
# cpufreqctl --format json info current max
4192190
# cpufreqctl --format json info current avg
2597208
# cpufreqctl --format json info current rnd
2104293 Thereby, all the frequency method can be queried individually. This is the first step. The next step would be to make So why was this API designed like this in the first place? Because I wanted to be able to easily add new frequency gathering methods as e.g. the Apart from changing the I am unsure whether something like: # cpufreqctl --format json info current list
[
"min",
"max",
"avg",
"rnd"
] is necessary to query for the available gathering methods. Maybe hard-coding this into the extension is fine as all the methods require extension-specific UI elements like translations and explanations texts anyway. So, I think the boosting and high CPU load are a side-effect of this utterly flawed implementation for gathering CPU frequencies. If adjusted to only query 1 of the active cores and not waking any core that is sleeping this problem would be solved for not only the pstate driver but for all drivers. |
Thanks for your point of view! I am unsure what you mean by "waking up sleeping core", as sleeping core in the sense of actual sleep state exists on Android, not x86. On Intel / AMD CPU, the "sleep state" is just a throttling of the frequency when the OS has nothing to execute on the CPU. Therefore, there is no way (as far as I know) to avoid this boosting, except (which is what I wackly propose in this PR) putting Optimizing for battery life / not boosting can consist in rewriting the If the only call the extension is doing is Edit: Typo |
A rewrite in C is not possible as this would violate the Gnome Extension Review Guidelines.
We are already stretching the guidelines as we are not providing the
Regarding your technical question:
Yes, that's ofc correct. But I thought this is generally available for ARM devices under Linux. Is this behavior Android specific? Nevertheless, for x86 this means not using a core unless it is absolutely necessary. The
Yes, so I would like to avoid this tool from pushing the frequency of a "low-frequency", NOP'ing core by querying its frequency. I am not really sure how to implement this without waking (e.g. pushing the frequency) the core, but the But I'm open for any other suggestion solving this problem. However, I am not a big fan of the Edit: add link to Gnome Extension Review Guidelines |
I see the problem. I don't think probing the cores is the main behavior that lead to boosting. However, multiple calls to awk may be. Can we imagine relying on another back-end than I will think about it, then come back if I have more ideas. |
Hmm, I think our |
Is a GJS implementation out of question (for frequency reading)? This would avoid both the need of a script and the calls to Another possibility is to send directly the readings in the JSON stdout return, the compute the aggregation method (min/max/avg/rnd) in the GJS extension, avoiding |
No, this is mostly for historic reasons as it started out as a small bash script and got bigger and bigger. The GJS requirement for scripts got added to the Guidelines after we already implemented the bash scripts, if I remember correctly. If I remember correctly, reading and writing files has included a lot of boilerplate in GJS, whereas in bash it's just
This would still require some interface for reading the frequency of one core, for only performance cores, etc. I think it is simpler to make the cut for API at the min, max, avg, rnd border and not communicate all the "let's select a core for frequency reading" stuff. But if you come up with an elegant solution, I'm up for it. The main downside would be that you could not as easily replicate the readings displayed in the extension from the command line. I would say that's an inconvenience. However, I fear that a lot of details of the individual CPU backends would "bleed" through, so that the extension code would have to deal with pstate, cpufreq, etc. On another note: Can you write some test snippets where you infinitely read without a sleep from |
A first simple fix is to run I analyzed with
However, looking at distribution of time across PID shows only 5 % of the time is actually spent in For you experiments, I blocked my frequency to 1.4 GHz, no
I don't really know what to think about that, except that both options look usable. EDIT: I am printing to |
I think this is because
Damn it :D Looks like your CPU is not affected by freezing and/or system lock ups. Especially server CPUs like AMD Threadripper and Intel Xeon seem to have difficulty keeping the system responsive when excessively reading CPU frequencies. I expected the parallel approach to have a way higher CPU load and introduce stutter, but this does not seem to be the case. I originally opted for the parallel read approach as especially older AMD CPUs take ages to read a single CPU frequency (1-2 seconds) and a 16 core CPU would then take 30+ seconds for a sequential read of all the frequencies. So although your results clearly show that an API change is not really necessary (your performance numbers look fine), I still think that the API should be changed so that people having CPUs that cannot handle reading the frequencies of all cores can switch to the
You can try writing a proof-of-concept in GJS for just reading the current frequency with the 4 methods and see how the performance numbers change. For your use case this might be beneficial, for others I think an API change is more beneficial. (e.g. #184, #138, #52) |
I rewrote it in pure bash (as I don't know JS well, so I'll see later for a pure GJS version) to remove the need of
I tested this (sequential version) on a i9-7980XE (18 cores/36 threads) locked at 1.2 GHz and:
So I cannot reproduce the hang. Maybe it's arch, maybe Intel. I would argue that the |
tool/cpufreqctl
Outdated
compute_min_max_avg_rnd () | ||
{ | ||
measures=($1) | ||
nb_measures=${#measures[@]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙈 this is bashism and may or may not work on distros like Alpine or Ubuntu (they use ash and dash). You can use shellcheck -a
to lint against these things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't tell me my sh
is, in fact, bash
....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is on Arch but that's not for granted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be a symlink
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole sh vs. bash vs. dash compatibility thing is a deep rabbit hole. Yet another good reason to use GJS instead 🙈
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nooooo you're right! I'm correcting that right away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not that easy. POSIX sh can't handle arrays...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option is to require bash
... ? But yeah, I spent way to much time on it, GJS is the right way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah, we can't require bash as this would rule out the whole Ubuntu land.
With the latest changes, |
Let's got with one last try for today. Removing the array did not change the execution time on the 1065G7, but tripled it on the 7980XE (10,5 sec wall time). I'll test as it, then will head for the GJS implementation later. |
dff5513
to
66324ed
Compare
Insert a
sleep
to avoid artificial high CPU frequency (boosting) when usingcpufreqctl info current
on a i7-1065G7 (Lenovo Yoga C940)