Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

controller: Add cloudwatch logging for pods #882

Merged
merged 1 commit into from
Jan 29, 2024

Conversation

ecpullen
Copy link
Contributor

@ecpullen ecpullen commented Jan 12, 2024

Issue number:

N/A

Description of changes:

Add an optional flag to the controller for archiving pod logs with CloudWatch.

Testing done:

Verified that logs are archived when the flag is set and not if the flag is not set. Tested archival for completed pods and interrupted pods.

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

Err(e) => {
let service_error = e.into_service_error();
if service_error.is_resource_already_exists_exception() {
info!("Log group already exists.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a little noisy?

Suggested change
info!("Log group already exists.")
debug!("testsys Cloudwatch Log group already exists.")

.context(error::NoLogsSnafu { pod: pod_name })
}

pub(crate) async fn archive_logs(k8s_client: kube::Client, job_name: &str) -> JobResult<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when does this happen? Is it only after the test has completed or failed? Or can this be kicked off at the start of the pod and the logs stream in realtime?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It happens as a job is deleted. I wasn't able to find a solution for realtime logs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might not be something "inside" the system, but there are likely plenty of solutions for sending pod logs in realtime, so we should consider using those as well. Not going to block the PR though as we need to get something going for this.

@ecpullen ecpullen marked this pull request as ready for review January 25, 2024 22:41
#[snafu(display("Unable to create log event '{}': {:?}", log_event, source))]
CreateLogEvent {
log_event: String,
source: aws_sdk_cloudwatchlogs::types::SdkError<
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm, since we've run into it before, will this unwrap the error fully to something useful?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it will show the underlying error since it is using debug.

Ok(name)
}

async fn pod_logs(k8s_client: kube::Client, pod_name: &str) -> JobResult<String> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the testsys model already have a way to get logs? Why do we need a new, different way here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are accessing the logs from different interfaces. (ResourceInterface vs TestManager)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should backlog an issue to unify this so that log fetching updates do not require updating 2 places. Not necessary for this change though

@ecpullen ecpullen merged commit f358f54 into bottlerocket-os:develop Jan 29, 2024
3 checks passed
@ecpullen ecpullen deleted the controller-logger branch January 29, 2024 23:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants