Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-11511. Introduce metrics in deletion services of OM #7377

Merged
merged 23 commits into from
Jan 8, 2025

Conversation

Tejaskriya
Copy link
Contributor

@Tejaskriya Tejaskriya commented Oct 30, 2024

What changes were proposed in this pull request?

The following metrics are added in OM's deletingServices:
KeyDeleting
No of keys processed for delete since last restart
No of keys sent to scm for delete and response(success/fail) since last restart

DirDeleting
No of dirs/empty dirs processed for delete since last restart
No of subfiles processed for delete since last restart

Dirpurge
No of dirs purged since last restart
No of subfiles purged since last restart

Keypurge
No of keys purged since last restart

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-11511

How was this patch tested?

Tested manually in docker at the jmx endpoint http://localhost:9874/jmx?qry=Hadoop:service=OzoneManager,name=DeletingServiceMetrics :

{
  "beans" : [ {
    "name" : "Hadoop:service=OzoneManager,name=DeletingServiceMetrics",
    "modelerType" : "DeletingServiceMetrics",
    "tag.Context" : "ozone",
    "tag.Hostname" : "7c4d4db221d4",
    "NumDirsPurged" : 10,
    "NumDirsSentForPurge" : 10,
    "NumKeysProcessed" : 0,
    "NumKeysPurged" : 0,
    "NumKeysSentForPurge" : 0,
    "NumSubDirsMovedToDeletedDirTable" : 9,
    "NumSubDirsSentForPurge" : 0,
    "NumSubFilesMovedToDeletedTable" : 3,
    "NumSubFilesSentForPurge" : 3
  } ]
}

@Tejaskriya Tejaskriya marked this pull request as ready for review November 4, 2024 08:33
@errose28 errose28 self-requested a review November 4, 2024 17:39
@Tejaskriya
Copy link
Contributor Author

@ashishkumar50 thank you for the reviews, I have addressed the comments, could you please review the update?

@Tejaskriya
Copy link
Contributor Author

@ashishkumar50 I have addressed all the comments, thank you for the review.
@errose28 could you please review the patch?

Copy link
Contributor

@sumitagrawl sumitagrawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plz check for review comments

public void setDirectoryDeletionIterationMetrics(long runcount, long startTime, long duration,
long dirDel, long subdirDel,
long filesMove, long subdirMove) {
setIterationDirRunCount(runcount);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do not need capture runCount and startTime

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see anything wrong with capturing run count. It has been useful in the past to detect when the service has completed a run after a restart, and can be divided over past time intervals to get average time taken per run.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@errose28 I am not getting what we are going to achieve with runCount as metrics. If there are 10 threads running, it will be increase by 10 times as default.
We can use duration as metrics for performance, but not the startTime. This will give better metrics. So startTime may not be useful, this can be easily identified as Timer task.

For task running, based on delete count increase, and other data can show the progress. So Above metric is not useful.

@Tejaskriya Tejaskriya marked this pull request as draft December 3, 2024 06:26
@Tejaskriya Tejaskriya marked this pull request as ready for review December 10, 2024 09:27
@Tejaskriya
Copy link
Contributor Author

Tejaskriya commented Dec 17, 2024

@sumitagrawl @ashishkumar50 @errose28 I have updated the PR to have only restart metrics and removed the iteration specific ones.
If we determine that iteration specific metrics are needed uniformly across all the services, I can raise a different PR for adding it in all roles OM, SCM and DN.
The aim of this PR is focused to add the missing metrics in OM only. Could you please review the updated PR?

@@ -456,6 +460,7 @@ public long optimizeDirDeletesAndSubmitRequest(long remainNum,
" totalRunCount: {}",
dirNum, subdirDelNum, subFileNum, (subDirNum - subdirDelNum),
Time.monotonicNow() - startTime, rnCnt);
metrics.incrementDirectoryDeletionTotalMetrics(dirNum + subdirDelNum, subdirMoved, subFileNum);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate if {deletedDirsCount, movedDirsCount, movedFilesCount} as variable, PurgeDir metrics {NumSubDirectoriesMoved, NumSubFilesMoved, NumDirPurged} and Metrics here {dirDel, dirMove, fileMove} are same value, else this will create confusion between metrics, and also with logs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! I have verified the numbers using integration tests and updated the PR with some checks in the tests.

@Tejaskriya
Copy link
Contributor Author

@sumitagrawl @ashishkumar50 When a delete is performed with skipTrash, the logs and metrics show the exact number of deletes as the user perceives. But in the case we don't skipTrash, due to a checkpoint directory created which is also deleted, the count of deleted directories is +1 -- which is reflected in both the pre-existing logs and metrics added in this PR.

All comments have been addressed. Can you please review the updated PR?

Copy link
Contributor

@sumitagrawl sumitagrawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sumitagrawl sumitagrawl merged commit a1324b6 into apache:master Jan 8, 2025
42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants