Memory pool gets corrupted when oversized allocations are returned to pool out of order #585

mkuron · 2023-06-21T09:13:08Z

When I perform oversized (i.e. > 1 MB) allocations and deallocations from a thrust::mr::unsynchronized_pool_resource, the linked list used by Thrust to keep track of the oversized allocations seems to get corrupted when I perform the allocations and deallocations in arbitrary order. For comparison, when the order of deallocations is the exact reverse of the order of allocations, everything seems to go well.

This piece of code leads to a segmentation fault:

#include <thrust/mr/pool.h>

#include <cstdlib>
#include <memory>
#include <string>

class HostPool
{
public:
  HostPool()
  {
    auto options = Pool::get_default_options();
    options.cache_oversized = true;
    _pool = std::make_unique<Pool>(options);
  }

  void * allocate(std::size_t mem_size)
  {
    return _pool->do_allocate(mem_size, THRUST_MR_DEFAULT_ALIGNMENT);
  }

  void deallocate(void *ptr, std::size_t size)
  {
    _pool->do_deallocate(ptr, size, THRUST_MR_DEFAULT_ALIGNMENT);
  }

private:
  class HostResource : public thrust::mr::memory_resource<>
  {
  public:
    void * do_allocate(std::size_t bytes, std::size_t alignment = THRUST_MR_DEFAULT_ALIGNMENT) override
    {
      return std::malloc(bytes);
    }

    void do_deallocate(void * p, std::size_t bytes,
                       std::size_t alignment = THRUST_MR_DEFAULT_ALIGNMENT) override
    {
      std::free(p);
    }
  };

  using Pool = thrust::mr::unsynchronized_pool_resource<HostResource>;

  std::unique_ptr<Pool> _pool;
};

int main()
{
  HostPool pool;

  auto ptr1 = pool.allocate(7320040);
  std::memset(ptr1, 0xff, 7320040);
  auto ptr2 = pool.allocate(14640080);
  std::memset(ptr2, 0xff, 14640080);
  pool.deallocate(ptr1, 7320040);
  pool.deallocate(ptr2, 14640080);
  auto ptr3 = pool.allocate(2465264);
  std::memset(ptr3, 0xff, 2465264);
  auto ptr4 = pool.allocate(2465264);
  std::memset(ptr4, 0xff, 2465264);
  pool.deallocate(ptr3, 2465264);
  pool.deallocate(ptr4, 2465264);
  auto ptr5 = pool.allocate(2465264);
  std::memset(ptr5, 0xff, 2465264);
  pool.deallocate(ptr5, 2465264);
  auto ptr6 = pool.allocate(2465264);
  std::memset(ptr6, 0xff, 2465264);
  auto ptr7 = pool.allocate(2465264);
  std::memset(ptr7, 0xff, 2465264);
  auto ptr8 = pool.allocate(2465264);
  std::memset(ptr8, 0xff, 2465264);
  pool.deallocate(ptr6, 2465264);
  auto ptr9 = pool.allocate(14791584);
  std::memset(ptr9, 0xff, 14791584);
  auto ptr10 = pool.allocate(4930528);
  std::memset(ptr10, 0xff, 4930528);
  pool.deallocate(ptr7, 2465264);
  pool.deallocate(ptr8, 2465264);
  pool.deallocate(ptr9, 14791584);
  pool.deallocate(ptr10, 4930528);
  auto ptr11 = pool.allocate(21960120);
  std::memset(ptr11, 0xff, 21960120);
  auto ptr12 = pool.allocate(2465264);
  std::memset(ptr12, 0xff, 2465264);
  auto ptr13 = pool.allocate(43920240);
  std::memset(ptr13, 0xff, 43920240);
  pool.deallocate(ptr11, 21960120);
  pool.deallocate(ptr12, 2465264);
  auto ptr14 = pool.allocate(2465264);
  std::memset(ptr14, 0xff, 2465264);
  pool.deallocate(ptr13, 43920240);
  pool.deallocate(ptr14, 2465264);
  auto ptr15 = pool.allocate(14640080);
  std::memset(ptr15, 0xff, 14640080);
  auto ptr16 = pool.allocate(4930528);
  std::memset(ptr16, 0xff, 4930528);
  pool.deallocate(ptr15, 14640080);
  pool.deallocate(ptr16, 4930528);
  auto ptr17 = pool.allocate(14640080);
  std::memset(ptr17, 0xff, 14640080);

  auto crash = pool.allocate(4930528);

  std::memset(crash, 0xff, 4930528);
  pool.deallocate(crash, 4930528);
  pool.deallocate(ptr17, 14640080);
  return 0;
}

All the allocations and deallocations here are in pairs and specify matching sizes, yet the last allocation leads to a crash. Debugging inside Thrust shows that m_cached_oversized points to memory that looks like it has been overwritten by the memset calls in my sample.

I am observing this with Thrust 1.16.0, but since the memory pool implementation hasn't changed in years, I expect it also happens in Thrust 1.17.2 and 2.x. My original sample was obviously using device memory instead of std::malloc/std::free, but after I noticed where the bug is, I was able to switch to host memory to completely eliminate CUDA and NVCC from the reproducer.

The text was updated successfully, but these errors were encountered:

wence- · 2023-07-10T15:20:11Z

Here's a shortened sequence that also shows the issue:

int main()
{
  HostPool pool;

  auto ptr1 = pool.allocate(43920240);
  auto ptr2 = pool.allocate(2465264);
  pool.deallocate(ptr1, 43920240);
  pool.deallocate(ptr2, 2465264);
  auto ptr3 = pool.allocate(4930528);
  pool.deallocate(ptr3, 4930528);
  auto ptr4 = pool.allocate(14640080);
  std::memset(ptr4, 0xff, 14640080);

  auto crash = pool.allocate(4930528);

  pool.deallocate(crash, 4930528);
  pool.deallocate(ptr4, 14640080);
  return 0;
}

This code:

https://github.com/NVIDIA/thrust/blob/24486a169a62a58ef8f824d3dc9613c006b6f5a7/thrust/mr/pool.h#L312-L315

looks suspect. The pointer descriptor for large blocks is stored at the end of the allocation. However, suppose that we first allocate a block of size N. So we malloc N + sizeof(descriptor) bytes. Then our user code memsets N bytes. Now we deallocate and return the block to the pool.

Deallocation tries to find the block descriptor by looking at N bytes past the start of the pointer (and finds it...).

Now suppose that the next allocation is for P < N bytes, so we grab the just released block. But now, when we deallocate this new pointer, we try and find the block descriptor at P bytes past the start of the pointer. But that is not where the descriptor lives (since do_allocate doesn't move the location of the descriptor if it find a valid block).

At this point, we put nonsense into the cached large block list. Valgrind complains with:

==32283== Conditional jump or move depends on uninitialised value(s)
==32283==    at 0x10AE24: thrust::mr::unsynchronized_pool_resource<HostPool::HostResource>::do_allocate(unsigned long, unsigned long) (pool.h:270)
==32283==    by 0x10A8DA: HostPool::allocate(unsigned long) (thrust-mr-bug.cpp:18)
==32283==    by 0x10A468: main (thrust-mr-bug.cpp:57)

Unfortunately, doing the "obvious" thing of copying the descriptor into the correct place isn't enough to solve the problem, and I got lost in a maze of doubly-linked lists...

jrhemstad · 2023-07-10T15:42:59Z

Hey @mkuron thanks for the excellent bug report.

The Thrust memory resource functionality hasn't had much attention recently, so I'm not too surprised that there are some lingering issues. We'll definitely look into this.

In the meantime, and as an alternative, you might look into trying RMM. Despite it's name, it is not specific to RAPIDS at all. RMM was actually inspired in part by Thrust's memory resource functionality. It uses a nearly identical device_memory_resource polymorphic base class. The only real difference is that allocate and deallocate also take a cudaStream_t argument.

Here's an example from RMM using a device_memory_resource with a Thrust execution policy: https://github.com/rapidsai/rmm/blob/branch-23.08/include/rmm/exec_policy.hpp

RMM has been far more actively developed and used and so it should be much more robust. Furthermore, you may even see some performance improvements!

griwes · 2023-07-10T15:49:44Z

Oh wow, you're absolutely right about this. I'm very surprised it took this long for this problem to surface, maybe just not many people actively use the functionality in the wild.

I have an initial idea of how to fix this, but I'll need to prove to myself that it's workable and doesn't mess up the alignment of the returned pointer...

mkuron · 2023-08-01T15:47:11Z

Thanks everyone for looking into the bug. @griwes, have you had any success implementing your initial idea?

jrhemstad · 2023-10-19T16:44:38Z

Howdy @mkuron, sorry about the delay, we've been pretty preoccupied with moving Thrust to a new repo (I'm moving this issue now). @griwes will be looking into this and getting back to you soon.

griwes · 2023-10-24T19:01:57Z

@mkuron I believe I have a fix, can you try out this branch: https://github.com/griwes/cccl/tree/bug/pool-oversized-smaller-alloc and let me know if that fixes the original problem? I've added the minimized testcase to the test suite, but would like feedback on how this behaves in the real world situation you have.

mkuron · 2023-11-06T08:05:30Z

Thanks @griwes, this patch does fix the problem in our real-world application as well.

mkuron · 2023-12-04T08:15:10Z

@griwes, can we get your patch merged soon, or is there any testing that still needs to be done?

griwes self-assigned this Jul 10, 2023

jrhemstad transferred this issue from NVIDIA/thrust Oct 19, 2023

github-project-automation bot added this to CCCL Oct 19, 2023

github-project-automation bot moved this to Todo in CCCL Oct 19, 2023

github-actions bot assigned alliepiper Oct 19, 2023

github-actions bot added the needs triage Issues that require the team's attention label Oct 19, 2023

This comment was marked as outdated.

Sign in to view

griwes added thrust For all items related to Thrust. bug Something isn't working right. and removed needs triage Issues that require the team's attention labels Oct 20, 2023

griwes moved this from Todo to In Progress in CCCL Oct 20, 2023

griwes unassigned alliepiper Oct 24, 2023

griwes mentioned this issue Dec 19, 2023

thrust/mr: fix the case of reuising a block for a smaller alloc. #1232

Merged

2 tasks

cccl-authenticator-app bot moved this from In Progress to In Review in CCCL Dec 19, 2023

griwes closed this as completed in #1232 Jan 23, 2024

github-project-automation bot moved this from In Review to Done in CCCL Jan 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory pool gets corrupted when oversized allocations are returned to pool out of order #585

Memory pool gets corrupted when oversized allocations are returned to pool out of order #585

mkuron commented Jun 21, 2023 •

edited

Loading

wence- commented Jul 10, 2023 •

edited

Loading

jrhemstad commented Jul 10, 2023

griwes commented Jul 10, 2023

mkuron commented Aug 1, 2023

jrhemstad commented Oct 19, 2023

This comment was marked as outdated.

griwes commented Oct 24, 2023

mkuron commented Nov 6, 2023

mkuron commented Dec 4, 2023

Memory pool gets corrupted when oversized allocations are returned to pool out of order #585

Memory pool gets corrupted when oversized allocations are returned to pool out of order #585

Comments

mkuron commented Jun 21, 2023 • edited Loading

wence- commented Jul 10, 2023 • edited Loading

jrhemstad commented Jul 10, 2023

griwes commented Jul 10, 2023

mkuron commented Aug 1, 2023

jrhemstad commented Oct 19, 2023

This comment was marked as outdated.

griwes commented Oct 24, 2023

mkuron commented Nov 6, 2023

mkuron commented Dec 4, 2023

mkuron commented Jun 21, 2023 •

edited

Loading

wence- commented Jul 10, 2023 •

edited

Loading