Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Config tuning and dynamic dispatch for device segmented radix sort * add script to autotune:build job, to remove executables if the size is too large for artifact * fix ci script * fix typo * autotune:execute-tuning checks if build artifact has executables * fix incrementing in script * fix checking the executables * fix review comments * fix ci script * fix printing error by usage using multiline yaml * test ci script * let's try the backslash * this should work now * move autotune:execute-tuning to use multiline yaml for script * docs:(partition_two_way): add partition_two_way to sphinx * docs(batch_memcpy): add batch_memcpy to sphinx * docs(intrinsics): add match_any and group_elect to sphinx * docs(memcpy): improve consistency with other pages * docs: Migrate to using rocm-docs-core with the extension config * docs: Declare TOCs in _toc.yml.in This fixes the warnings given by sphinx_external_toc. Be explicit and add toc `tableofcontents` directives where the TOCs should be inserted. See https://github.com/executablebooks/sphinx-external-toc#add-a-toc-to-a-pages-content for more info. * Add memcpy to the summary of operations * Add exclusive_scan interfaces without initial value to warp_scan CUB has these, therefore hipCUB needs them too. Currently these are being worked around in hipCUB by using undocumented APIs of rocPRIM (`to_exclusive`) See for example [warp_scan.hpp:107 in hipCUB](https://github.com/ROCmSoftwarePlatform/hipCUB/blob/f459480f78164328214b75b16ffef338f1d4bc89/hipcub/include/hipcub/backend/rocprim/warp/warp_scan.hpp#L107) * Add tests for warp exclusive_scan without initial value The tests are based on the current tests modified to skip checking the first value of each warp. * Add warp_scan::exclusive_scan overloads wo initial value to CHANGELOG * Add too large logical warp runtime errors to the new warp_scan function * Improve documentation of warp_scan::exclusive_scan wo initial value - Add the no initial value part to the brief description - Hide the `enable_if` and the overloads required for runtime errors from the docs. * fix: Fixed doxygen warning in device_config_helper.hpp * style: Minor edits to warp_exclusive_scan wo init * fix(test_device_adjacent_difference.cpp): fixed unused variable warning * Consistent doxygen parameters in the new excl. scan APIs * Fix, simplify warp_id & lane_id variables in warp_exclusive_scan wo init * Fix MSVC warning due to depricated getenv * Fix formatting in test_warp_scan.hpp * clang-format * Fix linker issues for test debug compilation * Use shuffle instead of shuffle_random in hipgraph test * style(device_scan.hpp): use chevron-style ('<<<...>>>') kernel launching * fix(device_scan.hpp): derive the intermediate accumulator type from the scan operator instead of the initial value/input type This reduces the number of type conversions and makes the accuracy of the operation directly dependent on the chosen binary operator. * fix(device_scan_by_key.hpp): derive the intermediate accumulator type from the scan operator instead of the initial value/input type * test(test_device_scan.cpp): scan tests derive accumulator type from output of operator on device and host * test(test_device_scan.cpp): scan by key tests derive accumulator type from output of operator on device and host * Disable __int128_t tests on platforms without support * docs(device_scan.hpp): reflect device scan accumulator changes in documentation * fix(device_scan.hpp): use rocprim::detail::match_result_type instead of std::result_of This fixes compile time errors where the resulting type cannot be derived from device-only lambdas and functors. * fix(device_scan.hpp,device_scan_by_key.hpp): revert default accumulator type in scan algorithms back to using result type * revert(test_device_scan.cpp): scan by key tests derive accumulator type from output of operator on device and host This reverts commit 5bb5b16. * feat(device_scan.hpp): added an optional type parameter for the accumulator type in scan algorithms By default the accumulator type is based on the scan operator. This is the intended behaviour for hipCUB, but rocThrust still bases this on the value type of the input iterator. To accomodate for both requirements, the accumulator type had to be exposed. * revert(test_device_scan.cpp): scan tests derive accumulator type from output of operator on device and host This reverts commit 0d2d482. * docs(device_scan.hpp,device_scan_by_key.hpp): updated the documentation to include the optional type parameter for the accumulator in scan algorithms * revert(device_scan.hpp): reflect device scan accumulator changes in documentation This reverts commit 9d0ab0e. * style: improve formatting * docs(changelog.md): reflect changes to intermediate type in changelog * docs(changelog.md): update the changelog to include the addition of the optional accumulator type in scan algorithms * fix(device_scan.hpp,device_scan_by_key.hpp): use initial value for accumulator for exclusive scan * docs(device_scan.hpp,device_scan_by_key.hpp): update accumulator type parameter documentation * style: update copyright * style(test_device_scan.cpp): fix formatting * feat(type_traits.hpp): expose 'invoke_result' and 'invoke_result_binary_op' These were previously internal functions. * removed unused code warnings in benchmarks and added warning compiler flags to gitlab ci for benchmarks * docs: improve documentation * style: update copyright and fix style * Formating and copyright data changes * refactor(match_result_type.hpp,type_traits.hpp): moved implementation details of invoke_result to type_traits.hpp * test(test_type_traits.cpp): added tests for 'invoke_result' * refactor(test_invoke_result.cpp): rename from 'test_type_traits' to 'test_invoke_result' * style: update copyright dates * test(test_invoke_result.cpp): test also cover device-only functions * feat(type_traits.hpp): add c++17-styled aliases for 'invoke_result' and 'invoke_result_binary_op' * style: update style * Removed redundant inheritance in device templates * refactor(test_invoke_result.cpp): use fixed-width integer types * Fixed linting * declare the return type of lambda used in adjacent difference, to avoid compile errors * Fixed warp_exchange blocked_to_striped_shuffle and striped_to_blocked_shuffle The logical warp size was not passed to the shuffle operation, therefore only the first logical warp in the block was executed. * add new api calls for device_adjacent_difference * Improved warp_exchange test suite Multiple logical warps are executed per test. Added tests with 2 and 8 byte value types. * rename new api function to avoid overload and make things clearer * Updated copyright dates * fix call not changed in test * Updated changelog * fix review comments update docs comments refactor test_device_adjacent_difference in_place part * add test cases for device_adjacent_difference to check for input iterators not returning value_type for operator[] * fix format check large index test * fix format and merge errors * fix review comments rename to indirect iterator simplify indirect iterator * change adjacent_difference_alias to adjacent_difference_inplace * fix review comments have separate code paths for non aliased and in place calls in tests documentation updates * Fix unique_by_key to allow input and output values iterators aliasing * fix rocm 6.0 compilation errors add api variant loggin to other test * Help compiler optimize unused value_type * refactor(deatail/various.hpp): Perfect forwarding for foreach_in_tuple Instead of taking l-value references use std::forward to forward value type to the passed function. For example allows foreach_in_tuple to be called on const tuples. * build(CMakeLists.txt): Skip packaging when project isn't toplevel Skip packaging when we're being added as a sub-project (for example using FetchContent). Only a single project can use `rocm_create_package()` we don't want to trump over whoever is depending on us. This should fix the warnings like "rocm_package_add_deb_dependencies called after rocm_create_package!" in hipCUB (and probably rocThrust). * CHANGELOG and copyright updates * refactor(detaul/temp_storage.hpp): Make layout() const on partitions * build(cmake): Add cmake option to disable installation Default to ON for backward compatibility * refactor: Further improve foreach_in_tuple - Use an array instead of an (implicit) initializer_list. - Be consistent with the template parameter name * build(CHANGELOG.md, CMakeLists.txt): Set version number in CMake * docs: Fix some documentation warnings/errors * docs: Fixed changelog style * refactor(test_device_adjacent_difference): Reduce code duplication Simplify code by factoring out common parts of in-place and out-of-place tests. * docs: Fixed SPHINX_DIR * refactor(warp_scan): Deprecate undocumented to_exclusive APIs These were used by prior versions of CUB, but now have public replacements. * Add initial Windows CI * Substituted DeviceSelectWarpSize with device_test_enabled_for_warp_size_v * Documentation fix after rebase --------- Co-authored-by: Bence Parajdi <bence@streamhpc.com> Co-authored-by: Nara Prasetya <nara@streamhpc.com> Co-authored-by: Gergely Meszaros <gergely@streamhpc.com> Co-authored-by: Balint Soproni <balint@streamhpc.com> Co-authored-by: Beatriz Navidad Vilches <beatriz@streamhpc.com> Co-authored-by: Nick Breed <nick@streamhpc.com>
- Loading branch information