Skip to content

Latest commit

 

History

History
17 lines (14 loc) · 709 Bytes

README.md

File metadata and controls

17 lines (14 loc) · 709 Bytes

CUDAMPI

A large hybrid CPU/GPU sorting network using CUDA and MPI. The sorting network uses a standard Quicksort for CPUs and a custom Bitonic Sort for GPUs. These two algorithms were the fastest in a number of prior benchmarks.

We execute the first step of a bucketsort algorithm to presort the data. A bucket only contains numbers in a given range. We put each number into its corresponding bucket. This can be done in parallel. Now each bucket can be sorted on either a CPU or a GPU.

The sorting network uses the filesystem as a process management solution. Therefore no explicit locks are required. MPI/IO is used to write to the filesystem in parallel. The result is a number of sorted files.