Skip to content

Commit

Permalink
StreamHPC 2024-01-16 (#509)
Browse files Browse the repository at this point in the history
* Config tuning and dynamic dispatch for device segmented radix sort

* add script to autotune:build job, to remove executables if the size is too large for artifact

* fix ci script

* fix typo

* autotune:execute-tuning checks if build artifact has executables

* fix incrementing in script

* fix checking the executables

* fix review comments

* fix ci script

* fix printing error by usage using multiline yaml

* test ci script

* let's try the backslash

* this should work now

* move autotune:execute-tuning to use multiline yaml for script

* docs:(partition_two_way): add partition_two_way to sphinx

* docs(batch_memcpy): add batch_memcpy to sphinx

* docs(intrinsics): add match_any and group_elect to sphinx

* docs(memcpy): improve consistency with other pages

* docs: Migrate to using rocm-docs-core with the extension config

* docs: Declare TOCs in _toc.yml.in

This fixes the warnings given by sphinx_external_toc. Be explicit and
add toc `tableofcontents` directives where the TOCs should be inserted.

See https://github.com/executablebooks/sphinx-external-toc#add-a-toc-to-a-pages-content
for more info.

* Add memcpy to the summary of operations

* Add exclusive_scan interfaces without initial value to warp_scan

CUB has these, therefore hipCUB needs them too. Currently these are being
worked around in hipCUB by using undocumented APIs of rocPRIM (`to_exclusive`)

See for example [warp_scan.hpp:107 in hipCUB](https://github.com/ROCmSoftwarePlatform/hipCUB/blob/f459480f78164328214b75b16ffef338f1d4bc89/hipcub/include/hipcub/backend/rocprim/warp/warp_scan.hpp#L107)

* Add tests for warp exclusive_scan without initial value

The tests are based on the current tests modified to skip checking
the first value of each warp.

* Add warp_scan::exclusive_scan overloads wo initial value to CHANGELOG

* Add too large logical warp runtime errors to the new warp_scan function

* Improve documentation of warp_scan::exclusive_scan wo initial value

- Add the no initial value part to the brief description
- Hide the `enable_if` and the overloads required for runtime errors from
  the docs.

* fix: Fixed doxygen warning in device_config_helper.hpp

* style: Minor edits to warp_exclusive_scan wo init

* fix(test_device_adjacent_difference.cpp): fixed unused variable warning

* Consistent doxygen parameters in the new excl. scan APIs

* Fix, simplify warp_id & lane_id variables in warp_exclusive_scan wo init

* Fix MSVC warning due to depricated getenv

* Fix formatting in test_warp_scan.hpp

* clang-format

* Fix linker issues for test debug compilation

* Use shuffle instead of shuffle_random in hipgraph test

* style(device_scan.hpp): use chevron-style ('<<<...>>>') kernel launching

* fix(device_scan.hpp): derive the intermediate accumulator type from the scan operator instead of the initial value/input type

This reduces the number of type conversions and makes the accuracy of the operation directly dependent on the chosen binary operator.

* fix(device_scan_by_key.hpp): derive the intermediate accumulator type from the scan operator instead of the initial value/input type

* test(test_device_scan.cpp): scan tests derive accumulator type from output of operator on device and host

* test(test_device_scan.cpp): scan by key tests derive accumulator type from output of operator on device and host

* Disable __int128_t tests on platforms without support

* docs(device_scan.hpp): reflect device scan accumulator changes in documentation

* fix(device_scan.hpp): use rocprim::detail::match_result_type instead of std::result_of

This fixes compile time errors where the resulting type cannot be derived from device-only lambdas and functors.

* fix(device_scan.hpp,device_scan_by_key.hpp): revert default accumulator type in scan algorithms back to using result type

* revert(test_device_scan.cpp): scan by key tests derive accumulator type from output of operator on device and host

This reverts commit 5bb5b16.

* feat(device_scan.hpp): added an optional type parameter for the accumulator type in scan algorithms

By default the accumulator type is based on the scan operator. This is the intended behaviour for hipCUB, but rocThrust still bases this on the value type of the input iterator. To accomodate for both requirements, the accumulator type had to be exposed.

* revert(test_device_scan.cpp): scan tests derive accumulator type from output of operator on device and host

This reverts commit 0d2d482.

* docs(device_scan.hpp,device_scan_by_key.hpp): updated the documentation to include the optional type parameter for the accumulator in scan algorithms

* revert(device_scan.hpp): reflect device scan accumulator changes in documentation

This reverts commit 9d0ab0e.

* style: improve formatting

* docs(changelog.md): reflect changes to intermediate type in changelog

* docs(changelog.md): update the changelog to include the addition of the optional accumulator type in scan algorithms

* fix(device_scan.hpp,device_scan_by_key.hpp): use initial value for accumulator for exclusive scan

* docs(device_scan.hpp,device_scan_by_key.hpp): update accumulator type parameter documentation

* style: update copyright

* style(test_device_scan.cpp): fix formatting

* feat(type_traits.hpp): expose 'invoke_result' and 'invoke_result_binary_op'

These were previously internal functions.

* removed  unused code warnings in benchmarks and added warning compiler flags to gitlab ci for benchmarks

* docs: improve documentation

* style: update copyright and fix style

* Formating and copyright data changes

* refactor(match_result_type.hpp,type_traits.hpp): moved implementation details of invoke_result to type_traits.hpp

* test(test_type_traits.cpp): added tests for 'invoke_result'

* refactor(test_invoke_result.cpp): rename from 'test_type_traits' to 'test_invoke_result'

* style: update copyright dates

* test(test_invoke_result.cpp): test also cover device-only functions

* feat(type_traits.hpp): add c++17-styled aliases for 'invoke_result' and 'invoke_result_binary_op'

* style: update style

* Removed redundant inheritance in device templates

* refactor(test_invoke_result.cpp): use fixed-width integer types

* Fixed linting

* declare the return type of lambda used in adjacent difference, to avoid compile errors

* Fixed warp_exchange blocked_to_striped_shuffle and striped_to_blocked_shuffle

The logical warp size was not passed to the shuffle operation, therefore
only the first logical warp in the block was executed.

* add new api calls for device_adjacent_difference

* Improved warp_exchange test suite

Multiple logical warps are executed per test. Added tests with 2 and 8
byte value types.

* rename new api function to avoid overload and make things clearer

* Updated copyright dates

* fix call not changed in test

* Updated changelog

* fix review comments
update docs comments
refactor test_device_adjacent_difference in_place part

* add test cases for device_adjacent_difference to check for input iterators not returning value_type for operator[]

* fix format
check large index test

* fix format and merge errors

* fix review comments
rename to indirect iterator
simplify indirect iterator

* change adjacent_difference_alias to adjacent_difference_inplace

* fix review comments
have separate code paths for non aliased and in place calls in tests
documentation updates

* Fix unique_by_key to allow input and output values iterators aliasing

* fix rocm 6.0 compilation errors
add api variant loggin to other test

* Help compiler optimize unused value_type

* refactor(deatail/various.hpp): Perfect forwarding for foreach_in_tuple

Instead of taking l-value references use std::forward to forward value
type to the passed function. For example
allows foreach_in_tuple to be called on const tuples.

* build(CMakeLists.txt): Skip packaging when project isn't toplevel

Skip packaging when we're being added as a sub-project
(for example using FetchContent). Only a single project can use
`rocm_create_package()` we don't want to trump over whoever is depending
on us.

This should fix the warnings like
"rocm_package_add_deb_dependencies called after rocm_create_package!"
in hipCUB (and probably rocThrust).

* CHANGELOG and copyright updates

* refactor(detaul/temp_storage.hpp): Make layout() const on partitions

* build(cmake): Add cmake option to disable installation

Default to ON for backward compatibility

* refactor: Further improve foreach_in_tuple

- Use an array instead of an (implicit) initializer_list.
- Be consistent with the template parameter name

* build(CHANGELOG.md, CMakeLists.txt): Set version number in CMake

* docs: Fix some documentation warnings/errors

* docs: Fixed changelog style

* refactor(test_device_adjacent_difference): Reduce code duplication

Simplify code by factoring out common parts of in-place and out-of-place
tests.

* docs: Fixed SPHINX_DIR

* refactor(warp_scan): Deprecate undocumented to_exclusive APIs

These were used by prior versions of CUB, but now have public
replacements.

* Add initial Windows CI

* Substituted DeviceSelectWarpSize with device_test_enabled_for_warp_size_v

* Documentation fix after rebase

---------

Co-authored-by: Bence Parajdi <bence@streamhpc.com>
Co-authored-by: Nara Prasetya <nara@streamhpc.com>
Co-authored-by: Gergely Meszaros <gergely@streamhpc.com>
Co-authored-by: Balint Soproni <balint@streamhpc.com>
Co-authored-by: Beatriz Navidad Vilches <beatriz@streamhpc.com>
Co-authored-by: Nick Breed <nick@streamhpc.com>
  • Loading branch information
7 people authored Mar 19, 2024
1 parent 2dce71a commit 013fb2c
Show file tree
Hide file tree
Showing 91 changed files with 9,911 additions and 2,344 deletions.
99 changes: 98 additions & 1 deletion .gitlab-ci.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# MIT License
#
# Copyright (c) 2017-2023 Advanced Micro Devices, Inc. All rights reserved.
# Copyright (c) 2017-2024 Advanced Micro Devices, Inc. All rights reserved.
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
Expand Down Expand Up @@ -29,6 +29,7 @@ include:
- /deps-docs.yaml
- /deps-rocm.yaml
- /deps-vcpkg.yaml
- /deps-windows.yaml
- /gpus-rocm.yaml
- /rules.yaml

Expand Down Expand Up @@ -247,6 +248,7 @@ build:benchmark:
-S $CI_PROJECT_DIR
-G Ninja
-D CMAKE_CXX_COMPILER="$AMDCLANG"
-D CMAKE_CXX_FLAGS="-Wall -Wextra -Werror -Wno-#pragma-messages"
-D CMAKE_BUILD_TYPE=Release
-D BUILD_TEST=OFF
-D BUILD_EXAMPLE=OFF
Expand All @@ -260,6 +262,48 @@ build:benchmark:
- $BUILD_DIR/deps/googlebenchmark/
expire_in: 2 weeks

build:windows:
stage: build
needs: []
extends:
- .rules:build
- .gpus:rocm-windows
- .deps:rocm-windows
- .deps:visual-studio-devshell
parallel:
matrix:
- BUILD_TYPE:
# Disabled due to extensive link times.
# This is tracked in issue 679
#- Debug
- Release
BUILD_TARGET:
- BENCHMARK
- TEST
script:
- mkdir -p $CI_PROJECT_DIR/build
- cmake -G Ninja
-S $CI_PROJECT_DIR
-B $CI_PROJECT_DIR/build
-D BUILD_$BUILD_TARGET=ON
-D GPU_TARGETS=$GPU_TARGET
-D CMAKE_CXX_COMPILER:PATH="${env:HIP_PATH}\bin\clang++.exe"
-D CMAKE_C_COMPILER:PATH="${env:HIP_PATH}\bin\clang.exe"
-D CMAKE_PREFIX_PATH:PATH="${env:HIP_PATH}"
-D CMAKE_BUILD_TYPE="$BUILD_TYPE"
- cmake --build "$CI_PROJECT_DIR/build"
artifacts:
paths:
- $CI_PROJECT_DIR/build/test/test_*
- $CI_PROJECT_DIR/build/test/rocprim/test_*
- $CI_PROJECT_DIR/build/test/CTestTestfile.cmake
- $CI_PROJECT_DIR/build/test/rocprim/CTestTestfile.cmake
- $CI_PROJECT_DIR/build/gtest/
- $CI_PROJECT_DIR/build/CMakeCache.txt
- $CI_PROJECT_DIR/build/.ninja_log
- $CI_PROJECT_DIR/build/CTestTestfile.cmake
expire_in: 2 weeks

autotune:build:
stage: autotune
needs: []
Expand Down Expand Up @@ -289,6 +333,19 @@ autotune:build:
-D GPU_TARGETS=$GPU_TARGETS
- cmake --build . --target $BENCHMARK_TARGETS
- 'rm -rf $BUILD_DIR/benchmark/benchmark*.parallel'
# remove benchmark executables if their size together is too large for gitlab ci to handle
- |
total_size_bytes=0
while read -r file_size; do
total_size_bytes=$((total_size_bytes + file_size))
done < <(stat --format="%s" benchmark/benchmark*)
total_size_gib="$(numfmt --round=down --to-unit=Gi "$total_size_bytes")"
if [ "$total_size_gib" -ge 3 ]; then
printf "Total size: %s (%d bytes) > 3GiB, skipping benchmark executables from the artifact.\n" \
"$(numfmt --to=iec-i "$total_size_bytes")" "$total_size_bytes"
rm benchmark/benchmark*
fi
artifacts:
paths:
- $BUILD_DIR/benchmark/benchmark*
Expand Down Expand Up @@ -320,6 +377,39 @@ test:
--resource-spec-file ./resources.json
--parallel $PARALLEL_JOBS

.test-windows-base:
stage: test
extends:
- .deps:rocm-windows
- .gpus:rocm-gpus-windows
- .deps:visual-studio-devshell
- .rules:test
script:
- cd $CI_PROJECT_DIR/build
- ctest --output-on-failure

# Disabled due to extensive link times.
# This is tracked in issue 679
# test-windows-debug:
# extends:
# - .test-windows-base
# needs:
# - job: build:windows
# parallel:
# matrix:
# - BUILD_TYPE: Debug
# BUILD_TARGET: TEST

test-windows-release:
extends:
- .test-windows-base
needs:
- job: build:windows
parallel:
matrix:
- BUILD_TYPE: Release
BUILD_TARGET: TEST

.test-package:
script:
- cmake
Expand Down Expand Up @@ -369,6 +459,8 @@ test:deb:

test:docs:
stage: test
variables:
SPHINX_DIR: $DOCS_DIR/sphinx
extends:
- .rules:test
- .build:docs
Expand Down Expand Up @@ -472,6 +564,11 @@ autotune:execute-tuning:
# On ROCm 5.7 or later, check if this can be removed - the presumption is that the failure is caused by a compiler issue.
- >
cd "${CI_PROJECT_DIR}"
- |
if [ ! -d "${BUILD_DIR}/benchmark" ]; then
echo "There are no benchmark executables. Run the build job with a BUILD_TARGET."
exit 1
fi
- mkdir -p "${AUTOTUNE_RESULT_DIR}"
- python3
.gitlab/run_benchmarks.py
Expand Down
16 changes: 16 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,22 @@
Documentation for rocPRIM is available at
[https://rocm.docs.amd.com/projects/rocPRIM/en/latest/](https://rocm.docs.amd.com/projects/rocPRIM/en/latest/).

## Unreleased rocPRIM-3.2.0 for ROCm 6.2.0

### Additions

* New overloads for `warp_scan::exclusive_scan` that take no initial value. These new overloads will write an unspecified result to the first value of each warp.
* The internal accumulator type of `inclusive_scan(_by_key)` and `exclusive_scan(_by_key)` is now exposed as an optional type parameter.
* The default accumulator type is still the value type of the input iterator (inclusive scan) or the initial value's type (exclusive scan).
This is the same behaviour as before this change.
* New overload for `device_adjacent_difference_inplace` that allows separate input and output iterators, but allows them to point to the same element.

### Fixes

* Fixed incorrect results of `warp_exchange::blocked_to_striped_shuffle` and `warp_exchange::striped_to_blocked_shuffle` when the block size is
larger than the logical warp size. The test suite has been updated with such cases.
* Fixed incorrect results returned when calling device `unique_by_key` with overlapping `values_input` and `values_output`.

## Unreleased rocPRIM-3.1.0 for ROCm 6.1.0

### Additions
Expand Down
52 changes: 32 additions & 20 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# MIT License
#
# Copyright (c) 2017-2023 Advanced Micro Devices, Inc. All rights reserved.
# Copyright (c) 2017-2024 Advanced Micro Devices, Inc. All rights reserved.
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
Expand Down Expand Up @@ -29,6 +29,12 @@ set(CMAKE_INSTALL_PREFIX "/opt/rocm" CACHE PATH "Install path prefix, prepended
# rocPRIM project
project(rocprim LANGUAGES CXX)

if (CMAKE_CURRENT_SOURCE_DIR STREQUAL CMAKE_SOURCE_DIR)
set(ROCPRIM_PROJECT_IS_TOP_LEVEL TRUE)
else()
set(ROCPRIM_PROJECT_IS_TOP_LEVEL FALSE)
endif()

#Adding CMAKE_PREFIX_PATH
if(WIN32)
set(ROCM_ROOT "$ENV{HIP_PATH}" CACHE PATH "Root directory of the ROCm installation")
Expand All @@ -44,6 +50,7 @@ option(USE_HIP_CPU "Prefer HIP-CPU runtime instead of HW acceleration" OFF)
# Disables building tests, benchmarks, examples
option(ONLY_INSTALL "Only install" OFF)
option(BUILD_CODE_COVERAGE "Build with code coverage enabled" OFF)
option(ROCPRIM_INSTALL "Enable installation of rocPRIM (projects embedding rocPRIM may want to turn this OFF)" ON)

# CMake modules
list(APPEND CMAKE_MODULE_PATH
Expand Down Expand Up @@ -94,7 +101,7 @@ endif()

# FOR HANDLING ENABLE/DISABLE OPTIONAL BACKWARD COMPATIBILITY for FILE/FOLDER REORG
option(BUILD_FILE_REORG_BACKWARD_COMPATIBILITY "Build with file/folder reorg with backward compatibility enabled" OFF)
if(BUILD_FILE_REORG_BACKWARD_COMPATIBILITY AND NOT WIN32)
if(ROCPRIM_INSTALL AND BUILD_FILE_REORG_BACKWARD_COMPATIBILITY AND NOT WIN32)
rocm_wrap_header_dir(
"${PROJECT_SOURCE_DIR}/rocprim/include/rocprim"
WRAPPER_LOCATIONS rocprim/include/rocprim
Expand All @@ -114,7 +121,7 @@ if(USE_HIP_CPU)
endif()

# Setup VERSION
set(VERSION_STRING "3.1.0")
set(VERSION_STRING "3.2.0")
rocm_setup_version(VERSION ${VERSION_STRING})

# Print configuration summary
Expand All @@ -124,20 +131,24 @@ print_configuration_summary()
# rocPRIM library
add_subdirectory(rocprim)

if(NOT ONLY_INSTALL AND (BUILD_TEST OR BUILD_BENCHMARK))
if(ROCPRIM_PROJECT_IS_TOP_LEVEL AND NOT ONLY_INSTALL AND (BUILD_TEST OR BUILD_BENCHMARK))
rocm_package_setup_component(clients)
endif()

# Tests
if(BUILD_TEST AND NOT ONLY_INSTALL)
rocm_package_setup_client_component(tests)
if (ROCPRIM_PROJECT_IS_TOP_LEVEL)
rocm_package_setup_client_component(tests)
endif()
enable_testing()
add_subdirectory(test)
endif()

# Benchmarks
if(BUILD_BENCHMARK AND NOT ONLY_INSTALL)
rocm_package_setup_client_component(benchmarks)
if (ROCPRIM_PROJECT_IS_TOP_LEVEL)
rocm_package_setup_client_component(benchmarks)
endif()
add_subdirectory(benchmark)
endif()

Expand All @@ -147,20 +158,21 @@ if(BUILD_EXAMPLE AND NOT ONLY_INSTALL)
endif()

# Package
set(BUILD_SHARED_LIBS ON) # Build as though shared library for naming
rocm_package_add_dependencies(DEPENDS "hip-rocclr >= 3.5.0")
set(CPACK_RESOURCE_FILE_LICENSE "${CMAKE_CURRENT_SOURCE_DIR}/LICENSE.txt")
set(CPACK_RPM_PACKAGE_LICENSE "MIT")

set(CPACK_RPM_EXCLUDE_FROM_AUTO_FILELIST_ADDITION "\${CPACK_PACKAGING_INSTALL_PREFIX}" )

rocm_create_package(
NAME rocprim
DESCRIPTION "Radeon Open Compute Parallel Primitives Library"
MAINTAINER "rocPRIM Maintainer <rocprim-maintainer@amd.com>"
HEADER_ONLY
)

if (ROCPRIM_PROJECT_IS_TOP_LEVEL)
set(BUILD_SHARED_LIBS ON) # Build as though shared library for naming
rocm_package_add_dependencies(DEPENDS "hip-rocclr >= 3.5.0")
set(CPACK_RESOURCE_FILE_LICENSE "${CMAKE_CURRENT_SOURCE_DIR}/LICENSE.txt")
set(CPACK_RPM_PACKAGE_LICENSE "MIT")

set(CPACK_RPM_EXCLUDE_FROM_AUTO_FILELIST_ADDITION "\${CPACK_PACKAGING_INSTALL_PREFIX}" )

rocm_create_package(
NAME rocprim
DESCRIPTION "Radeon Open Compute Parallel Primitives Library"
MAINTAINER "rocPRIM Maintainer <rocprim-maintainer@amd.com>"
HEADER_ONLY
)
endif()

#
# ADDITIONAL TARGETS FOR CODE COVERAGE
Expand Down
9 changes: 6 additions & 3 deletions benchmark/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# MIT License
#
# Copyright (c) 2017-2023 Advanced Micro Devices, Inc. All rights reserved.
# Copyright (c) 2017-2024 Advanced Micro Devices, Inc. All rights reserved.
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
Expand Down Expand Up @@ -95,7 +95,9 @@ function(add_rocprim_benchmark BENCHMARK_SOURCE)
RUNTIME_OUTPUT_DIRECTORY "${CMAKE_BINARY_DIR}/benchmark"
)

rocm_install(TARGETS ${BENCHMARK_TARGET} COMPONENT benchmarks)
if (ROCPRIM_INSTALL)
rocm_install(TARGETS ${BENCHMARK_TARGET} COMPONENT benchmarks)
endif()
if (WIN32 AND NOT DEFINED DLLS_COPIED)
set(DLLS_COPIED "YES")
set(DLLS_COPIED ${DLLS_COPIED} PARENT_SCOPE)
Expand Down Expand Up @@ -145,7 +147,8 @@ add_rocprim_benchmark(benchmark_device_run_length_encode.cpp)
add_rocprim_benchmark(benchmark_device_scan.cpp)
add_rocprim_benchmark(benchmark_device_scan_by_key.cpp)
add_rocprim_benchmark(benchmark_device_select.cpp)
add_rocprim_benchmark(benchmark_device_segmented_radix_sort.cpp)
add_rocprim_benchmark(benchmark_device_segmented_radix_sort_keys.cpp)
add_rocprim_benchmark(benchmark_device_segmented_radix_sort_pairs.cpp)
add_rocprim_benchmark(benchmark_device_segmented_reduce.cpp)
add_rocprim_benchmark(benchmark_device_transform.cpp)
add_rocprim_benchmark(benchmark_warp_exchange.cpp)
Expand Down
12 changes: 12 additions & 0 deletions benchmark/ConfigAutotuneSettings.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -81,5 +81,17 @@ ${TUNING_TYPES};${LIMITED_TUNING_TYPES};using_warp_scan reduce_then_scan" PARENT
set(list_across "\
binary_search upper_bound lower_bound;${TUNING_TYPES};${LIMITED_TUNING_TYPES};64 128 256;1 2 4 8 16" PARENT_SCOPE)
set(output_pattern_suffix "@SubAlgorithm@_@ValueType@_@OutputType@_@BlockSize@_@ItemsPerThread@" PARENT_SCOPE)
elseif(file STREQUAL "benchmark_device_segmented_radix_sort_keys")
set(list_across_names "\
KeyType;BlockSize;ItemsPerThread;PartitionAllowed" PARENT_SCOPE)
set(list_across "${TUNING_TYPES};128 256;4 8 16;false" PARENT_SCOPE)
set(output_pattern_suffix "\
@KeyType@_@BlockSize@_@ItemsPerThread@_@PartitionAllowed@" PARENT_SCOPE)
elseif(file STREQUAL "benchmark_device_segmented_radix_sort_pairs")
set(list_across_names "\
KeyType;ValueType;BlockSize;ItemsPerThread;PartitionAllowed" PARENT_SCOPE)
set(list_across "${TUNING_TYPES};int8_t;64;4 8 16;true false" PARENT_SCOPE)
set(output_pattern_suffix "\
@KeyType@_@ValueType@_@BlockSize@_@ItemsPerThread@_@PartitionAllowed@" PARENT_SCOPE)
endif()
endfunction()
3 changes: 1 addition & 2 deletions benchmark/benchmark_block_run_length_decode.cpp
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
// MIT License
//
// Copyright (c) 2021-2023 Advanced Micro Devices, Inc. All rights reserved.
// Copyright (c) 2021-2024 Advanced Micro Devices, Inc. All rights reserved.
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
Expand Down Expand Up @@ -57,7 +57,6 @@ __global__
rocprim::block_load_direct_blocked(global_thread_idx, d_run_items, run_items);
rocprim::block_load_direct_blocked(global_thread_idx, d_run_offsets, run_offsets);

ROCPRIM_SHARED_MEMORY typename BlockRunLengthDecodeT::storage_type temp_storage;
BlockRunLengthDecodeT block_run_length_decode(run_items, run_offsets);

const OffsetT total_decoded_size
Expand Down
6 changes: 3 additions & 3 deletions benchmark/benchmark_device_histogram.parallel.hpp
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
// MIT License
//
// Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved.
// Copyright (c) 2023-2024 Advanced Micro Devices, Inc. All rights reserved.
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
Expand Down Expand Up @@ -358,8 +358,8 @@ struct device_histogram_benchmark_generator
template<unsigned int Channels,
unsigned int ActiveChannels,
unsigned int items_per_thread = ItemsPerThread>
auto create(std::vector<std::unique_ptr<config_autotune_interface>>& storage,
const std::vector<unsigned int>& cases) ->
auto create(std::vector<std::unique_ptr<config_autotune_interface>>& /*storage*/,
const std::vector<unsigned int>& /*cases*/) ->
typename std::enable_if<!(items_per_thread * Channels <= max_items_per_thread),
void>::type
{}
Expand Down
Loading

0 comments on commit 013fb2c

Please sign in to comment.