-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory pool gets corrupted when oversized allocations are returned to pool out of order #585
Comments
Here's a shortened sequence that also shows the issue: int main()
{
HostPool pool;
auto ptr1 = pool.allocate(43920240);
auto ptr2 = pool.allocate(2465264);
pool.deallocate(ptr1, 43920240);
pool.deallocate(ptr2, 2465264);
auto ptr3 = pool.allocate(4930528);
pool.deallocate(ptr3, 4930528);
auto ptr4 = pool.allocate(14640080);
std::memset(ptr4, 0xff, 14640080);
auto crash = pool.allocate(4930528);
pool.deallocate(crash, 4930528);
pool.deallocate(ptr4, 14640080);
return 0;
} This code: looks suspect. The pointer descriptor for large blocks is stored at the end of the allocation. However, suppose that we first allocate a block of size Deallocation tries to find the block descriptor by looking at Now suppose that the next allocation is for At this point, we put nonsense into the cached large block list. Valgrind complains with:
Unfortunately, doing the "obvious" thing of copying the descriptor into the correct place isn't enough to solve the problem, and I got lost in a maze of doubly-linked lists... |
Hey @mkuron thanks for the excellent bug report. The Thrust memory resource functionality hasn't had much attention recently, so I'm not too surprised that there are some lingering issues. We'll definitely look into this. In the meantime, and as an alternative, you might look into trying RMM. Despite it's name, it is not specific to RAPIDS at all. RMM was actually inspired in part by Thrust's memory resource functionality. It uses a nearly identical Here's an example from RMM using a RMM has been far more actively developed and used and so it should be much more robust. Furthermore, you may even see some performance improvements! |
Oh wow, you're absolutely right about this. I'm very surprised it took this long for this problem to surface, maybe just not many people actively use the functionality in the wild. I have an initial idea of how to fix this, but I'll need to prove to myself that it's workable and doesn't mess up the alignment of the returned pointer... |
Thanks everyone for looking into the bug. @griwes, have you had any success implementing your initial idea? |
This comment was marked as outdated.
This comment was marked as outdated.
@mkuron I believe I have a fix, can you try out this branch: https://github.com/griwes/cccl/tree/bug/pool-oversized-smaller-alloc and let me know if that fixes the original problem? I've added the minimized testcase to the test suite, but would like feedback on how this behaves in the real world situation you have. |
Thanks @griwes, this patch does fix the problem in our real-world application as well. |
@griwes, can we get your patch merged soon, or is there any testing that still needs to be done? |
When I perform oversized (i.e. > 1 MB) allocations and deallocations from a
thrust::mr::unsynchronized_pool_resource
, the linked list used by Thrust to keep track of the oversized allocations seems to get corrupted when I perform the allocations and deallocations in arbitrary order. For comparison, when the order of deallocations is the exact reverse of the order of allocations, everything seems to go well.This piece of code leads to a segmentation fault:
All the allocations and deallocations here are in pairs and specify matching sizes, yet the last allocation leads to a crash. Debugging inside Thrust shows that
m_cached_oversized
points to memory that looks like it has been overwritten by thememset
calls in my sample.I am observing this with Thrust 1.16.0, but since the memory pool implementation hasn't changed in years, I expect it also happens in Thrust 1.17.2 and 2.x. My original sample was obviously using device memory instead of
std::malloc
/std::free
, but after I noticed where the bug is, I was able to switch to host memory to completely eliminate CUDA and NVCC from the reproducer.The text was updated successfully, but these errors were encountered: