Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Container CPU Utilization Resource Monitor #37792

Open
wants to merge 33 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
8d1863c
Add support for Container CPU Utilization Resource Monitor for loadh…
nix1n Dec 23, 2024
8c24c52
remove trailing whitespaces & add last newline
nix1n Dec 23, 2024
3de82b3
correct doc reference
nix1n Dec 23, 2024
174111d
fix changelog yamllint character limit
nix1n Dec 23, 2024
26d1e99
Add CGROUP in dictionary
nix1n Dec 23, 2024
59f6dd3
Merge branch 'envoyproxy:main' into main
nix1n Dec 23, 2024
b607d8c
Merge branch 'envoyproxy:main' into main
nix1n Dec 30, 2024
4315e3f
add container cpu utilization type monitor
nix1n Jan 1, 2025
354bec1
add myself as codeowner in cpu_utilization
nix1n Jan 1, 2025
beafb4b
correct proto config message
nix1n Jan 6, 2025
b6a619b
Merge branch 'envoyproxy:main' into main
nix1n Jan 6, 2025
e623cf4
docs and changelogs update
nix1n Jan 6, 2025
8979e8b
correct mode initialisation,and proto comments
nix1n Jan 7, 2025
060118f
refactor cpu stats error handling
nix1n Jan 7, 2025
a32c1f6
change time diff calculation strategy without timesource
nix1n Jan 8, 2025
ccd9456
Merge branch 'main' into main
nix1n Jan 8, 2025
3d7bbd1
fix test header
nix1n Jan 8, 2025
153a2a8
fix proto message comments
nix1n Jan 8, 2025
ca96425
fix proto message comments
nix1n Jan 8, 2025
114f444
report error from container cpu usage monitor
nix1n Jan 8, 2025
b93f3d8
add tests for missing cases
nix1n Jan 8, 2025
346cb53
Simplify Monitor class implementaion
nix1n Jan 9, 2025
9f90f9c
Merge branch 'envoyproxy:main' into main
nix1n Jan 9, 2025
64320f4
add missing library in BUIL
nix1n Jan 9, 2025
ef793ba
update dictionary
nix1n Jan 9, 2025
607050e
Merge branch 'envoyproxy:main' into main
nix1n Jan 10, 2025
e4817dc
add timesource support
nix1n Jan 10, 2025
1213a6c
fix doc and spelling
nix1n Jan 10, 2025
72da693
remove older restriction of refresh_interval > 0.01s from doc
nix1n Jan 10, 2025
2523785
revert unnecessary changed files
nix1n Jan 10, 2025
71bd67a
add more realistic calculation and test data
nix1n Jan 10, 2025
44188b2
fix clang format
nix1n Jan 10, 2025
c817300
fix spelling
nix1n Jan 10, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@ extensions/filters/common/original_src @klarose @mattklein123
/*/extensions/resource_monitors/common @eziskind @yanavlasov @nezdolik
/*/extensions/resource_monitors/fixed_heap @eziskind @yanavlasov @nezdolik
/*/extensions/resource_monitors/downstream_connections @nezdolik @mattklein123
/*/extensions/resource_monitors/cpu_utilization @cancecen @kbaichoo
/*/extensions/resource_monitors/cpu_utilization @cancecen @kbaichoo @nix1n
/*/extensions/retry/priority @alyssawilk @mattklein123
/*/extensions/retry/priority/previous_priorities @alyssawilk @mattklein123
/*/extensions/retry/host @alyssawilk @mattklein123
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ syntax = "proto3";
package envoy.extensions.resource_monitors.cpu_utilization.v3;

import "udpa/annotations/status.proto";
import "validate/validate.proto";

option java_package = "io.envoyproxy.envoy.extensions.resource_monitors.cpu_utilization.v3";
option java_outer_classname = "CpuUtilizationProto";
Expand All @@ -12,8 +13,12 @@ option (udpa.annotations.file_status).package_version_status = ACTIVE;

// [#protodoc-title: CPU utilization]
// [#extension: envoy.resource_monitors.cpu_utilization]

enum UtilizationComputeStrategy {
nix1n marked this conversation as resolved.
Show resolved Hide resolved
HOST = 0;
CONTAINER = 1;
}
// The CPU utilization resource monitor reports the Envoy process the CPU Utilization of the entire host.
// Today, this only works on Linux and is calculated using the stats in the /proc/stat file.
message CpuUtilizationConfig {
UtilizationComputeStrategy mode = 1;
}
5 changes: 5 additions & 0 deletions changelogs/current.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -406,6 +406,11 @@ new_features:
change: |
Add the option to reduce the rate limit budget based on request/response contexts on stream done.
See :ref:`apply_on_stream_done <envoy_v3_api_field_config.route.v3.RateLimit.apply_on_stream_done>` for more details.
- area: resource_monitors
change: |
Added extension to monitor Container CPU utilization in Linux K8s environment via :ref:`envoy container cpu utilization monitor
<envoy_v3_api_msg_extensions.resource_monitors.envoy_container_cpu_utilization.v3.EnvoyContainerCpuUtilizationConfig>` in
overload manager.

deprecated:
- area: rbac
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
overload_manager:
refresh_interval: 5s
resource_monitors:
- name: "envoy.resource_monitors.envoy_container_cpu_utilization"
typed_config:
"@type": type.googleapis.com/envoy.extensions.resource_monitors.envoy_container_cpu_utilization.v3.EnvoyContainerCpuUtilizationConfig
actions:
- name: "envoy.overload_actions.stop_accepting_requests"
triggers:
- name: "envoy.resource_monitors.envoy_container_cpu_utilization"
scaled:
scaling_threshold: 0.80
saturation_threshold: 0.95
Original file line number Diff line number Diff line change
Expand Up @@ -375,6 +375,30 @@ workload.
:linenos:
:caption: :download:`cpu_utilization_monitor_overload.yaml <_include/cpu_utilization_monitor_overload.yaml>`

Loadshedding in K8s environment
-------------------------------

In a Kubernetes environment, where Envoy workloads often share node resources with other applications, configuring this
overload action with a target container CPU utilization percentage offers a more adaptable approach than defining a fixed
request rate. This ensures that Envoy workloads can dynamically manage their CPU usage based on container-level metrics
without impacting other co-located workloads.

The ``envoy.overload_actions.stop_accepting_requests`` overload action can be utilized to safeguard Envoy workloads
in a Kubernetes environment from experiencing degraded performance during unexpected spikes in incoming requests
that saturate the container's allocated CPU resources. When combined with the ``envoy.resource_monitors.envoy_container_cpu_utilization``
resource monitor, this overload action can effectively reduce CPU pressure by rejecting new requests at a minimal computational cost.
While the long-term solution to handle such spikes is horizontally scaling the workload,
this overload action can help prevent cascading failures across the fleet by maintaining stability.

.. literalinclude:: _include/container_cpu_utilization_monitor_overload.yaml
:language: yaml
:lines: 1-13
:emphasize-lines: 3-13
:linenos:
:caption: :download:`container_cpu_utilization_monitor_overload.yaml <_include/container_cpu_utilization_monitor_overload.yaml>`

If neither CPU Requests nor CPU Limits has been provided to the envoy deployment in K8s, please use ``envoy.resource_monitors.cpu_utilization``
since in absence of resource limits or requests, the envoy container would be able to use as much resources available on a Kubernetes Node.

Statistics
----------
Expand Down
1 change: 0 additions & 1 deletion source/extensions/extensions_build_config.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -252,7 +252,6 @@ EXTENSIONS = {
"envoy.resource_monitors.injected_resource": "//source/extensions/resource_monitors/injected_resource:config",
"envoy.resource_monitors.global_downstream_max_connections": "//source/extensions/resource_monitors/downstream_connections:config",
"envoy.resource_monitors.cpu_utilization": "//source/extensions/resource_monitors/cpu_utilization:config",

#
# Stat sinks
#
Expand Down
12 changes: 8 additions & 4 deletions source/extensions/resource_monitors/cpu_utilization/config.cc
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
#include "source/extensions/resource_monitors/cpu_utilization/config.h"

#include "envoy/common/time.h"
#include "envoy/extensions/resource_monitors/cpu_utilization/v3/cpu_utilization.pb.h"
#include "envoy/extensions/resource_monitors/cpu_utilization/v3/cpu_utilization.pb.validate.h"
#include "envoy/registry/registry.h"

#include "source/common/protobuf/utility.h"
#include "source/extensions/resource_monitors/cpu_utilization/cpu_utilization_monitor.h"
#include "source/extensions/resource_monitors/cpu_utilization/linux_cpu_stats_reader.h"
Expand All @@ -15,10 +14,15 @@ namespace CpuUtilizationMonitor {

Server::ResourceMonitorPtr CpuUtilizationMonitorFactory::createResourceMonitorFromProtoTyped(
const envoy::extensions::resource_monitors::cpu_utilization::v3::CpuUtilizationConfig& config,
Server::Configuration::ResourceMonitorFactoryContext& /*unused_context*/) {
Server::Configuration::ResourceMonitorFactoryContext& context) {
// In the future, the below can be configurable based on the operating system.
TimeSource& time_source = context.api().timeSource();
if (config.mode() == envoy::extensions::resource_monitors::cpu_utilization::v3::UtilizationComputeStrategy::CONTAINER){
auto cgroup_stats_reader = std::make_unique<LinuxContainerCpuStatsReader>();
return std::make_unique<CpuUtilizationMonitor>(config, std::move(cgroup_stats_reader), time_source);
}
auto cpu_stats_reader = std::make_unique<LinuxCpuStatsReader>();
return std::make_unique<CpuUtilizationMonitor>(config, std::move(cpu_stats_reader));
return std::make_unique<CpuUtilizationMonitor>(config, std::move(cpu_stats_reader), time_source);
}

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,26 @@ struct CpuTimes {
uint64_t total_time;
};

struct CgroupStats {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason we can't use the existing cpu times structure given it has effectively the same fields?

This would simplify the utilization monitor as it could then just use the CpuStatsReader interface vs the concrete implementation class.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KBaichoo actually, for Host CPU Utilization, we just need /proc/stat file to read cpu_work and cpu_times both of the field in CpuTimes.
While to calculate container cpu utilization we need to read two different files to read allocated quota at some time, usage seconds total in nanoseconds at that time, and the time_difference from the timer only I could find to calculate. This method we are using in our ambassador edge stack service also, but to update injected resource pressure from a parallely running python script using same calculation strategy.

ref: google/cadvisor#2026 (comment)

So I created another class to read cgroup stats, which read allocated quota, total cpu time in nanoseconds.

This can be merged with CpuTimes class itself but not sure what meaningful naming would suffice for these two though both are uint64_t data type only. But yeah then the reader class also would need to access config mode and based on config mode it should calculate and return stats accordingly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Time difference can be calculated by timeSource only in case of cgroup metrics . Which we don't need while calculating usage of host from /proc/stat file since all the data in this is time dependent already.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point, if we were to push down the time source into the cgroup stats reader (along with some other logic) we would be able to produce the same format for CpuTimes -- e.g. https://github.com/envoyproxy/envoy/pull/37792/files#diff-1183c2c3937672e9d4c85d700d1e54ef13d360b4b517a0c643a8e46c13c3eb79R120

we could then de-dup a lot of the implementation details from being brought up at the monitor layer to avoid the monitoring layer having to have implementations for all of the different readers.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KBaichoo in that case, in cgroupstatsreader, I have to derive some calculation to incorporate timing also to allocated millicores and usage seconds total metrics to make it equivalent to , cputimes fields total time and work time ? Something like that you are meaning ? And then the resource monitor would still using the original strategy without checking config mode and switching the calculation strategy?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point, if we were to push down the time source into the cgroup stats reader (along with some other logic) we would be able to produce the same format for CpuTimes -- e.g.

have checked the calculation strategy, It should work. But trying now to pushdown timesource from context in stats reader.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KBaichoo /proc/uptime has precision till 100th of a second. We can use this without using timesource if refresh_interval is not selected below 0.01 seconds. But where to set it's limit ? How much less refresh_interval we can set ? Is it given ? We use 5 second of refresh_interval for loadshedding in our ambassador edgestack. Will there be any usecase that devs would use this for even less than 0.01 second refresh interval ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using proc/uptime metrics to calculate will get us rid of timesource in reader class and much simpler implementaion. Just with 1 limitation, refresh_interval less than 0.01 second won't work properly, but might update resource pressure as soon as the monitor's loop interval crosses 0.01 second. For refresh_interval greater than 0.01 seconds, it would be perfect. Please let me know if you will allow this. For us it should be sufficient.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have implemented all tests and functionality using proc/uptime. @KBaichoo . This should be sufficient for us.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something, why do we need to use uptime vs the timesource? ISTM it might be cheaper to use timesource e.g. no file open, etc. and no limitation on uptime granularity.

100% agree that for most use cases polling resource monitors faster is a great way to eat up CPU.

bool is_valid;
uint64_t cpu_allocated_millicores_; //total millicores of cpu allocated to container
uint64_t total_cpu_times_ns_; //total cpu times in nanoseconds
};

class CpuStatsReader {
public:
CpuStatsReader() = default;
virtual ~CpuStatsReader() = default;
virtual CpuTimes getCpuTimes() = 0;
};

class CgroupStatsReader {
public:
CgroupStatsReader() = default;
virtual ~CgroupStatsReader() = default;
virtual CgroupStats getCgroupStats() = 0;
nix1n marked this conversation as resolved.
Show resolved Hide resolved
};

} // namespace CpuUtilizationMonitor
} // namespace ResourceMonitors
} // namespace Extensions
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,13 +27,37 @@ constexpr double DAMPENING_ALPHA = 0.05;

CpuUtilizationMonitor::CpuUtilizationMonitor(
const envoy::extensions::resource_monitors::cpu_utilization::v3::
CpuUtilizationConfig& /*config*/,
std::unique_ptr<CpuStatsReader> cpu_stats_reader)
: cpu_stats_reader_(std::move(cpu_stats_reader)) {
CpuUtilizationConfig& config,
std::unique_ptr<CpuStatsReader> cpu_stats_reader,TimeSource &time_source)
: cpu_stats_reader_(std::move(cpu_stats_reader)),time_source_(time_source),last_update_time_(time_source.monotonicTime()),mode_(config.mode()) {
previous_cpu_times_ = cpu_stats_reader_->getCpuTimes();
}

CpuUtilizationMonitor::CpuUtilizationMonitor(
const envoy::extensions::resource_monitors::cpu_utilization::v3::
CpuUtilizationConfig& config,
std::unique_ptr<CgroupStatsReader> cgroup_stats_reader,TimeSource &time_source)
: cgroup_stats_reader_(std::move(cgroup_stats_reader)),time_source_(time_source),last_update_time_(time_source.monotonicTime()),mode_(config.mode()) {
previous_cgroup_stats_ = cgroup_stats_reader_->getCgroupStats();
}

void CpuUtilizationMonitor::updateResourceUsage(Server::ResourceUpdateCallbacks& callbacks) {
switch (mode_)
{
case envoy::extensions::resource_monitors::cpu_utilization::v3::UtilizationComputeStrategy::HOST:
computeHostCpuUsage(callbacks);
break;

case envoy::extensions::resource_monitors::cpu_utilization::v3::UtilizationComputeStrategy::CONTAINER:
computeContainerCpuUsage(callbacks);
break;

default:
nix1n marked this conversation as resolved.
Show resolved Hide resolved
break;
}
}

void CpuUtilizationMonitor::computeHostCpuUsage(Server::ResourceUpdateCallbacks& callbacks) {
CpuTimes cpu_times = cpu_stats_reader_->getCpuTimes();
if (!cpu_times.is_valid) {
const auto& error = EnvoyException("Can't open file to read CPU utilization");
Expand Down Expand Up @@ -66,6 +90,48 @@ void CpuUtilizationMonitor::updateResourceUsage(Server::ResourceUpdateCallbacks&
previous_cpu_times_ = cpu_times;
}

void CpuUtilizationMonitor::computeContainerCpuUsage(Server::ResourceUpdateCallbacks& callbacks) {
CgroupStats envoy_container_stats = cgroup_stats_reader_->getCgroupStats();
if (!envoy_container_stats.is_valid) {
const auto& error = EnvoyException("Can't open Cgroup cpu stat files");
callbacks.onFailure(error);
return;
}
uint64_t cpu_milli_cores = envoy_container_stats.cpu_allocated_millicores_;
if (cpu_milli_cores <= 0){
const auto &error = EnvoyException(fmt::format("Erroneous CPU Allocated Value: '{}', should be a positive number",cpu_milli_cores));
callbacks.onFailure(error);
return;
}

uint64_t cpu_work = envoy_container_stats.total_cpu_times_ns_ - previous_cgroup_stats_.total_cpu_times_ns_;
if (cpu_work <= 0){
const auto& error = EnvoyException(fmt::format("Erroneous CPU Work Value: '{}', should be a positive number",cpu_work));
callbacks.onFailure(error);
return;
}

MonotonicTime current_time = time_source_.monotonicTime();

double system_time_elapsed_milliseconds = std::chrono::duration_cast<std::chrono::milliseconds>(current_time - last_update_time_ ).count();
if (system_time_elapsed_milliseconds <= 0){
const auto& error = EnvoyException(fmt::format("Erroneous Value of Elapsed Time: '{}', should be a positive number",system_time_elapsed_milliseconds));
callbacks.onFailure(error);
return;
}

last_update_time_ = current_time;
double cpu_usage = (system_time_elapsed_milliseconds > 0 && cpu_milli_cores > 0 && cpu_work > 0 ) ? cpu_work / (system_time_elapsed_milliseconds * 1000 * cpu_milli_cores) : 0;
// The new utilization is calculated/smoothed using EWMA
utilization_ = cpu_usage * DAMPENING_ALPHA + (1 - DAMPENING_ALPHA) * utilization_;

Server::ResourceUsage usage;
usage.resource_pressure_ = utilization_;

callbacks.onSuccess(usage);
previous_cgroup_stats_ = envoy_container_stats;
}

} // namespace CpuUtilizationMonitor
} // namespace ResourceMonitors
} // namespace Extensions
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#pragma once

#include <chrono>

#include "envoy/common/time.h"
#include "envoy/extensions/resource_monitors/cpu_utilization/v3/cpu_utilization.pb.h"
#include "envoy/server/resource_monitor.h"

Expand All @@ -17,14 +17,25 @@ class CpuUtilizationMonitor : public Server::ResourceMonitor {
public:
CpuUtilizationMonitor(
const envoy::extensions::resource_monitors::cpu_utilization::v3::CpuUtilizationConfig& config,
std::unique_ptr<CpuStatsReader> cpu_stats_reader);
std::unique_ptr<CpuStatsReader> cpu_stats_reader, TimeSource& time_source);

CpuUtilizationMonitor(
const envoy::extensions::resource_monitors::cpu_utilization::v3::CpuUtilizationConfig& config,
std::unique_ptr<CgroupStatsReader> cgroup_stats_reader, TimeSource& time_source);

void updateResourceUsage(Server::ResourceUpdateCallbacks& callbacks) override;
void computeHostCpuUsage(Server::ResourceUpdateCallbacks& callbacks);
void computeContainerCpuUsage(Server::ResourceUpdateCallbacks& callbacks);

private:
double utilization_ = 0.0;
CpuTimes previous_cpu_times_;
CgroupStats previous_cgroup_stats_;
std::unique_ptr<CpuStatsReader> cpu_stats_reader_;
std::unique_ptr <CgroupStatsReader> cgroup_stats_reader_;
TimeSource& time_source_;
MonotonicTime last_update_time_;
envoy::extensions::resource_monitors::cpu_utilization::v3::UtilizationComputeStrategy mode_;
};

} // namespace CpuUtilizationMonitor
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,45 @@ CpuTimes LinuxCpuStatsReader::getCpuTimes() {
return {true, work_time, total_time};
}


LinuxContainerCpuStatsReader::LinuxContainerCpuStatsReader(const std::string& linux_cgroup_cpu_allocated_file, const std::string& linux_cgroup_cpu_times_file)
:linux_cgroup_cpu_allocated_file_(linux_cgroup_cpu_allocated_file),linux_cgroup_cpu_times_file_(linux_cgroup_cpu_times_file){}

CgroupStats LinuxContainerCpuStatsReader::getCgroupStats() {
std::ifstream cpu_allocated_file, cpu_times_file;
uint64_t cpu_allocated_value, cpu_times_value;
bool stats_valid = true;
cpu_allocated_file.open(linux_cgroup_cpu_allocated_file_);
if (!cpu_allocated_file.is_open()) {
KBaichoo marked this conversation as resolved.
Show resolved Hide resolved
ENVOY_LOG_MISC(error, "Can't open linux cpu allocated file {}", linux_cgroup_cpu_allocated_file_);
stats_valid = false;
cpu_allocated_value = 0;
}else{
cpu_allocated_file >> cpu_allocated_value;
if (!cpu_allocated_file) {
ENVOY_LOG_MISC(error, "Unexpected format in linux cpu allocated file {}", linux_cgroup_cpu_allocated_file_);
stats_valid = false;
cpu_allocated_value = 0;
}
}

cpu_times_file.open(linux_cgroup_cpu_times_file_);
if (!cpu_times_file.is_open()) {
ENVOY_LOG_MISC(error, "Can't open linux cpu usage seconds file {}", linux_cgroup_cpu_times_file_);
stats_valid = false;
cpu_times_value = 0;
}else{
cpu_times_file >> cpu_times_value;
if(!cpu_times_file) {
ENVOY_LOG_MISC(error, "Unexpected format in linux cpu usage seconds file {}", linux_cgroup_cpu_times_file_);
stats_valid = false;
cpu_times_value = 0;
}
}

return {stats_valid,cpu_allocated_value, cpu_times_value};
}

} // namespace CpuUtilizationMonitor
} // namespace ResourceMonitors
} // namespace Extensions
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ namespace ResourceMonitors {
namespace CpuUtilizationMonitor {

static const std::string LINUX_CPU_STATS_FILE = "/proc/stat";
static const std::string LINUX_CGROUP_CPU_ALLOCATED_FILE = "/sys/fs/cgroup/cpu/cpu.shares";
static const std::string LINUX_CGROUP_CPU_TIMES_FILE = "/sys/fs/cgroup/cpu/cpuacct.usage";

class LinuxCpuStatsReader : public CpuStatsReader {
public:
Expand All @@ -20,6 +22,16 @@ class LinuxCpuStatsReader : public CpuStatsReader {
const std::string cpu_stats_filename_;
};

class LinuxContainerCpuStatsReader: public CgroupStatsReader {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentioned above in https://github.com/envoyproxy/envoy/pull/37792/files#diff-9281e66aafccb8196311602044a32a4ac53a877d7bae55cc591df8f30ae15810R25 if we go that route this can then just inherit directly from CpuStatsReader interface.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KBaichoo have got rid of timesource and removed multiple redundancies and simplified the monitor as well , our org would not use refresh_interval smaller than even 1 second. And envoy's cpu overhead again increases if we use more smaller refresh_interval . So sticking with /proc/uptime stats which has precision of 0.01 second. Which is more than enough for most usecases and from now on linux cpu stats reader will be sending stats to monitor based on the strategy configured.

public:
LinuxContainerCpuStatsReader(const std::string& linux_cgroup_cpu_allocated_file = LINUX_CGROUP_CPU_ALLOCATED_FILE, const std::string& linux_cgroup_cpu_times_file = LINUX_CGROUP_CPU_TIMES_FILE);
CgroupStats getCgroupStats() override;

private:
const std::string linux_cgroup_cpu_allocated_file_;
const std::string linux_cgroup_cpu_times_file_;
};

} // namespace CpuUtilizationMonitor
} // namespace ResourceMonitors
} // namespace Extensions
Expand Down
1 change: 1 addition & 0 deletions tools/spelling/spelling_dictionary.txt
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ BEL
BBR
BIDIRECTIONAL
CCL
CGROUP
ECN
ECS
EKS
Expand Down