Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
chore(deps): update dependency cutlass_archive to v3.6.0 (#948)
This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [cutlass_archive](https://redirect.github.com/NVIDIA/cutlass) | http_archive | minor | `v3.5.1` -> `v3.6.0` | --- ### Release Notes <details> <summary>NVIDIA/cutlass (cutlass_archive)</summary> ### [`v3.6.0`](https://redirect.github.com/NVIDIA/cutlass/releases/tag/v3.6.0): CUTLASS 3.6.0 [Compare Source](https://redirect.github.com/NVIDIA/cutlass/compare/v3.5.1...v3.6.0) - [Hopper structured sparse GEMM](./examples/62\_hopper_sparse_gemm/62\_hopper_sparse_gemm.cu). - [FP16](./test/unit/gemm/device/sm90\_sparse_gemm_f16\_f16\_f32\_tensor_op_f32.cu) - [FP8](./test/unit/gemm/device/sm90\_sparse_gemm_f8\_f8\_f32\_tensor_op_f32.cu) - [INT8](./test/unit/gemm/device/sm90\_sparse_gemm_s8\_s8\_s32\_tensor_op_s32.cu) - [TF32](./test/unit/gemm/device/sm90\_sparse_gemm_tf32\_tf32\_f32\_tensor_op_f32.cu) - A refactor to the CUTLASS 3.x convolution `kernel::ConvUniversal` [API](./include/cutlass/conv/kernel/sm90\_implicit_gemm_tma_warpspecialized.hpp) to bring it in line with `gemm::GemmUniversal`. Now the 3.x convolution API is no longer considered as a beta API. - [An improved mixed input GEMM](./examples/55\_hopper_mixed_dtype_gemm/README.md) and a [lookup table implementation](./examples/55\_hopper_mixed_dtype_gemm/55\_hopper_int4\_fp8\_gemm.cu) for `INT4`x`FP8` scale-only mode. - [EVT nodes for Top-K selection and softmax](./include/cutlass/epilogue/fusion/sm90\_visitor_topk_softmax.hpp) and [GEMM example using those](./examples/61\_hopper_gemm_with_topk_and_softmax/61\_hopper_gemm_with_topk_and_softmax.cu). - [Programmatic Dependent Launch](./include/cutlass/arch/grid_dependency_control.h) (PDL) that leverages a new Hopper feature to speedup two back-to-back kernels, and its corresponding [documentations](./media/docs/dependent_kernel_launch.md). - [A new debugging tool, synclog](./include/cutlass/arch/synclog.hpp), for dumping out all synchronization events from within a kernel to a file. Please see [synclog documentation](./media/docs/utilities.md#debugging-asynchronous-kernels-with-cutlasss-built-in-synclog-tool) for details. - A new TMA-enabled [epilogue](./include/cutlass/epilogue/collective/sm90\_epilogue_array_tma_warpspecialized.hpp) for grouped GEMM that brings significant performance improvement, as well as its EVT support. - A SIMT-enabled pointer-array [epilogue](./include/cutlass/epilogue/collective/sm70\_epilogue_vectorized_array.hpp). - A new [Ping-Pong kernel schedule for Grouped GEMM](./include/cutlass/gemm/kernel/sm90\_gemm_array_tma_warpspecialized_pingpong.hpp) and some other optimizations. - [A new instantiation strategy for CUTLASS profiler kernels](./python/cutlass_library/sm90\_shapes.py) along with [improved documentation for instantiation level in CUTLASS profiler](./media/docs/profiler.md#instantiating-more-kernels-with-hopper). - A new hardware support for comparisons and computations of [`cutlass::bfloat16_t`](./include/cutlass/bfloat16.h) - Fixed use of isnan on Windows for [`half_t`](./test/unit/core/functional.cu). </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR was generated by [Mend Renovate](https://mend.io/renovate/). View the [repository job log](https://developer.mend.io/github/secretflow/spu). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS44MC4wIiwidXBkYXRlZEluVmVyIjoiMzkuODAuMCIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiZGVwZW5kZW5jaWVzIl19--> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
- Loading branch information