Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory pool gets corrupted when oversized allocations are returned to pool out of order #585

Closed
mkuron opened this issue Jun 21, 2023 · 9 comments · Fixed by #1232
Closed
Assignees
Labels
bug Something isn't working right. thrust For all items related to Thrust.

Comments

@mkuron
Copy link

mkuron commented Jun 21, 2023

When I perform oversized (i.e. > 1 MB) allocations and deallocations from a thrust::mr::unsynchronized_pool_resource, the linked list used by Thrust to keep track of the oversized allocations seems to get corrupted when I perform the allocations and deallocations in arbitrary order. For comparison, when the order of deallocations is the exact reverse of the order of allocations, everything seems to go well.

This piece of code leads to a segmentation fault:

#include <thrust/mr/pool.h>

#include <cstdlib>
#include <memory>
#include <string>

class HostPool
{
public:
  HostPool()
  {
    auto options = Pool::get_default_options();
    options.cache_oversized = true;
    _pool = std::make_unique<Pool>(options);
  }

  void * allocate(std::size_t mem_size)
  {
    return _pool->do_allocate(mem_size, THRUST_MR_DEFAULT_ALIGNMENT);
  }

  void deallocate(void *ptr, std::size_t size)
  {
    _pool->do_deallocate(ptr, size, THRUST_MR_DEFAULT_ALIGNMENT);
  }

private:
  class HostResource : public thrust::mr::memory_resource<>
  {
  public:
    void * do_allocate(std::size_t bytes, std::size_t alignment = THRUST_MR_DEFAULT_ALIGNMENT) override
    {
      return std::malloc(bytes);
    }

    void do_deallocate(void * p, std::size_t bytes,
                       std::size_t alignment = THRUST_MR_DEFAULT_ALIGNMENT) override
    {
      std::free(p);
    }
  };

  using Pool = thrust::mr::unsynchronized_pool_resource<HostResource>;

  std::unique_ptr<Pool> _pool;
};

int main()
{
  HostPool pool;

  auto ptr1 = pool.allocate(7320040);
  std::memset(ptr1, 0xff, 7320040);
  auto ptr2 = pool.allocate(14640080);
  std::memset(ptr2, 0xff, 14640080);
  pool.deallocate(ptr1, 7320040);
  pool.deallocate(ptr2, 14640080);
  auto ptr3 = pool.allocate(2465264);
  std::memset(ptr3, 0xff, 2465264);
  auto ptr4 = pool.allocate(2465264);
  std::memset(ptr4, 0xff, 2465264);
  pool.deallocate(ptr3, 2465264);
  pool.deallocate(ptr4, 2465264);
  auto ptr5 = pool.allocate(2465264);
  std::memset(ptr5, 0xff, 2465264);
  pool.deallocate(ptr5, 2465264);
  auto ptr6 = pool.allocate(2465264);
  std::memset(ptr6, 0xff, 2465264);
  auto ptr7 = pool.allocate(2465264);
  std::memset(ptr7, 0xff, 2465264);
  auto ptr8 = pool.allocate(2465264);
  std::memset(ptr8, 0xff, 2465264);
  pool.deallocate(ptr6, 2465264);
  auto ptr9 = pool.allocate(14791584);
  std::memset(ptr9, 0xff, 14791584);
  auto ptr10 = pool.allocate(4930528);
  std::memset(ptr10, 0xff, 4930528);
  pool.deallocate(ptr7, 2465264);
  pool.deallocate(ptr8, 2465264);
  pool.deallocate(ptr9, 14791584);
  pool.deallocate(ptr10, 4930528);
  auto ptr11 = pool.allocate(21960120);
  std::memset(ptr11, 0xff, 21960120);
  auto ptr12 = pool.allocate(2465264);
  std::memset(ptr12, 0xff, 2465264);
  auto ptr13 = pool.allocate(43920240);
  std::memset(ptr13, 0xff, 43920240);
  pool.deallocate(ptr11, 21960120);
  pool.deallocate(ptr12, 2465264);
  auto ptr14 = pool.allocate(2465264);
  std::memset(ptr14, 0xff, 2465264);
  pool.deallocate(ptr13, 43920240);
  pool.deallocate(ptr14, 2465264);
  auto ptr15 = pool.allocate(14640080);
  std::memset(ptr15, 0xff, 14640080);
  auto ptr16 = pool.allocate(4930528);
  std::memset(ptr16, 0xff, 4930528);
  pool.deallocate(ptr15, 14640080);
  pool.deallocate(ptr16, 4930528);
  auto ptr17 = pool.allocate(14640080);
  std::memset(ptr17, 0xff, 14640080);

  auto crash = pool.allocate(4930528);

  std::memset(crash, 0xff, 4930528);
  pool.deallocate(crash, 4930528);
  pool.deallocate(ptr17, 14640080);
  return 0;
}

All the allocations and deallocations here are in pairs and specify matching sizes, yet the last allocation leads to a crash. Debugging inside Thrust shows that m_cached_oversized points to memory that looks like it has been overwritten by the memset calls in my sample.

I am observing this with Thrust 1.16.0, but since the memory pool implementation hasn't changed in years, I expect it also happens in Thrust 1.17.2 and 2.x. My original sample was obviously using device memory instead of std::malloc/std::free, but after I noticed where the bug is, I was able to switch to host memory to completely eliminate CUDA and NVCC from the reproducer.

@wence-
Copy link
Contributor

wence- commented Jul 10, 2023

Here's a shortened sequence that also shows the issue:

int main()
{
  HostPool pool;

  auto ptr1 = pool.allocate(43920240);
  auto ptr2 = pool.allocate(2465264);
  pool.deallocate(ptr1, 43920240);
  pool.deallocate(ptr2, 2465264);
  auto ptr3 = pool.allocate(4930528);
  pool.deallocate(ptr3, 4930528);
  auto ptr4 = pool.allocate(14640080);
  std::memset(ptr4, 0xff, 14640080);

  auto crash = pool.allocate(4930528);

  pool.deallocate(crash, 4930528);
  pool.deallocate(ptr4, 14640080);
  return 0;
}

This code:

https://github.com/NVIDIA/thrust/blob/24486a169a62a58ef8f824d3dc9613c006b6f5a7/thrust/mr/pool.h#L312-L315

looks suspect. The pointer descriptor for large blocks is stored at the end of the allocation. However, suppose that we first allocate a block of size N. So we malloc N + sizeof(descriptor) bytes. Then our user code memsets N bytes. Now we deallocate and return the block to the pool.

Deallocation tries to find the block descriptor by looking at N bytes past the start of the pointer (and finds it...).

Now suppose that the next allocation is for P < N bytes, so we grab the just released block. But now, when we deallocate this new pointer, we try and find the block descriptor at P bytes past the start of the pointer. But that is not where the descriptor lives (since do_allocate doesn't move the location of the descriptor if it find a valid block).

At this point, we put nonsense into the cached large block list. Valgrind complains with:

==32283== Conditional jump or move depends on uninitialised value(s)
==32283==    at 0x10AE24: thrust::mr::unsynchronized_pool_resource<HostPool::HostResource>::do_allocate(unsigned long, unsigned long) (pool.h:270)
==32283==    by 0x10A8DA: HostPool::allocate(unsigned long) (thrust-mr-bug.cpp:18)
==32283==    by 0x10A468: main (thrust-mr-bug.cpp:57)

Unfortunately, doing the "obvious" thing of copying the descriptor into the correct place isn't enough to solve the problem, and I got lost in a maze of doubly-linked lists...

@jrhemstad
Copy link
Collaborator

Hey @mkuron thanks for the excellent bug report.

The Thrust memory resource functionality hasn't had much attention recently, so I'm not too surprised that there are some lingering issues. We'll definitely look into this.

In the meantime, and as an alternative, you might look into trying RMM. Despite it's name, it is not specific to RAPIDS at all. RMM was actually inspired in part by Thrust's memory resource functionality. It uses a nearly identical device_memory_resource polymorphic base class. The only real difference is that allocate and deallocate also take a cudaStream_t argument.

Here's an example from RMM using a device_memory_resource with a Thrust execution policy: https://github.com/rapidsai/rmm/blob/branch-23.08/include/rmm/exec_policy.hpp

RMM has been far more actively developed and used and so it should be much more robust. Furthermore, you may even see some performance improvements!

@griwes
Copy link
Collaborator

griwes commented Jul 10, 2023

Oh wow, you're absolutely right about this. I'm very surprised it took this long for this problem to surface, maybe just not many people actively use the functionality in the wild.

I have an initial idea of how to fix this, but I'll need to prove to myself that it's workable and doesn't mess up the alignment of the returned pointer...

@griwes griwes self-assigned this Jul 10, 2023
@mkuron
Copy link
Author

mkuron commented Aug 1, 2023

Thanks everyone for looking into the bug. @griwes, have you had any success implementing your initial idea?

@jrhemstad
Copy link
Collaborator

Howdy @mkuron, sorry about the delay, we've been pretty preoccupied with moving Thrust to a new repo (I'm moving this issue now). @griwes will be looking into this and getting back to you soon.

@jrhemstad jrhemstad transferred this issue from NVIDIA/thrust Oct 19, 2023
@github-project-automation github-project-automation bot moved this to Todo in CCCL Oct 19, 2023
@github-actions github-actions bot added the needs triage Issues that require the team's attention label Oct 19, 2023
@github-actions

This comment was marked as outdated.

@griwes griwes added thrust For all items related to Thrust. bug Something isn't working right. and removed needs triage Issues that require the team's attention labels Oct 20, 2023
@griwes griwes moved this from Todo to In Progress in CCCL Oct 20, 2023
@griwes
Copy link
Collaborator

griwes commented Oct 24, 2023

@mkuron I believe I have a fix, can you try out this branch: https://github.com/griwes/cccl/tree/bug/pool-oversized-smaller-alloc and let me know if that fixes the original problem? I've added the minimized testcase to the test suite, but would like feedback on how this behaves in the real world situation you have.

@mkuron
Copy link
Author

mkuron commented Nov 6, 2023

Thanks @griwes, this patch does fix the problem in our real-world application as well.

@mkuron
Copy link
Author

mkuron commented Dec 4, 2023

@griwes, can we get your patch merged soon, or is there any testing that still needs to be done?

@cccl-authenticator-app cccl-authenticator-app bot moved this from In Progress to In Review in CCCL Dec 19, 2023
@github-project-automation github-project-automation bot moved this from In Review to Done in CCCL Jan 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working right. thrust For all items related to Thrust.
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants