-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rdma: add separate bounce buffer freelist for data (eager) messages #614
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -239,18 +239,31 @@ OFI_NCCL_PARAM_INT(disable_dmabuf, "DISABLE_DMABUF", 0); | |
OFI_NCCL_PARAM_UINT(min_stripe_size, "MIN_STRIPE_SIZE", (64 * 1024)); | ||
|
||
/* | ||
* Minimum bounce buffers posted per endpoint. The plugin will attempt to post | ||
* Minimum ctrl recv buffers posted per rail. The plugin will attempt to post | ||
* more buffers if we dip below this threshold, allocating new buffers if needed. | ||
*/ | ||
OFI_NCCL_PARAM_INT(rdma_min_posted_ctrl_recv_buffers, "RDMA_MIN_POSTED_CTRL_RECV_BUFFERS", 64); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shouldn't this be a function out max outstanding requests, which today we have at 128? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's a bit tricky, because the outstanding requests max (
That said, we may need to tune this value, but the current value in this PR is already significantly higher than what is in master today (which is 16-32 per rail, and shared between eager and ctrl recv buffers). |
||
|
||
/* | ||
* Maximum ctrl recv buffers posted per rail. The plugin will not attempt to | ||
* post more buffers if we reach this threshold, returning available buffers to | ||
* the free list if needed | ||
*/ | ||
OFI_NCCL_PARAM_INT(rdma_max_posted_ctrl_recv_buffers, "RDMA_MAX_POSTED_CTRL_RECV_BUFFERS", 128); | ||
|
||
/* | ||
* Minimum (eager) bounce buffers posted per rail. The plugin will attempt to post | ||
* more bounce buffers if we dip below this threshold, allocating new bounce | ||
* buffers if needed. | ||
*/ | ||
OFI_NCCL_PARAM_INT(rdma_min_posted_bounce_buffers, "RDMA_MIN_POSTED_BOUNCE_BUFFERS", 64); | ||
OFI_NCCL_PARAM_INT(rdma_min_posted_bounce_buffers, "RDMA_MIN_POSTED_BOUNCE_BUFFERS", 16); | ||
bwbarrett marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
/* | ||
* Maximum bounce buffers posted per endpoint. The plugin will not attempt to | ||
* Maximum (eager) bounce buffers posted per rail. The plugin will not attempt to | ||
* post more bounce buffers if we reach this threshold, returning available | ||
* buffers to the free list if needed | ||
*/ | ||
OFI_NCCL_PARAM_INT(rdma_max_posted_bounce_buffers, "RDMA_MAX_POSTED_BOUNCE_BUFFERS", 128); | ||
OFI_NCCL_PARAM_INT(rdma_max_posted_bounce_buffers, "RDMA_MAX_POSTED_BOUNCE_BUFFERS", 32); | ||
|
||
/* | ||
* Internode network latency reported to NCCL. Defaults to 0, unless the configured | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per what rail? device, endpoint, etc?