-
Notifications
You must be signed in to change notification settings - Fork 371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cspan: Suggesting an ergonomic C99 API with multidimensional matrices for BLIS #772
Comments
This is neat, but does it need to be part of BLIS or can it live as an independent project that uses BLIS and potentially CBLAS? |
FYI, Both the reference implementation of mdspan ( https://github.com/kokkos/mdspan ), and NVIDIA's implementation of mdspan in the HPC SDK, have always provided the version of |
I share Jeff's sentiment that this is neat, btw : - ) . |
Thanks to both of you! Yes, cspan has obviously less features than the mdspan proposed slicing. After all, they spent nearly a decade on it. But I think cspan will still cover a lot of use cases, and it is even less verbose and simpler to use than std::mdspan in some cases.
Even though cspan is a self-contained, generic library, my assumption was that it is most useful in the context with blis/cblas, although dynamic multi-dimensional arrays have other useful applications. It surely doesn't need to be a part of BLIS, but many hesitate to use libraries in the wild, particularly without it first being approved/recommended in the community/documentation (it's also just a single small file currently). My thought was that it could be a foundation for an easier (and maybe less bug-prone?) way to call blis/cblas functionality, provided that someone added wrapper functions that takes cspan matrices as arguments. |
Unfortunately, WG21 did not bless our multiple efforts to reduce mdspan's verbosity. For example, we suggested a small change to the language's definition of incomplete type in order for us to spell extents more concisely, but we didn't have enough experience at the time to show that it wouldn't break something. We also suggested more concise spellings of That being said, mdspan is very much a C++ library, with C++ ideas about customization. C should do things the C way; Fortran should do things the Fortran way. Nevertheless, it should be straightforward for this library to interact with mdspan. If you store the pointer first, the extents second, and the strides last, it should even be bitwise copyable into an mdspan with the matching layout and run-time extents (with caveat that C++ doesn't yet have P2642, which would provide for BLAS-like or overaligned layouts). |
I have wondered if some of the syntax could have been a little less verbose. As an (earlier) c++ developer for two decades, mdspan is an impressive software engineerig achievement, in particular to get it through the needle's eye of the commitee. While talking about std::mdspan, I belive Bryce Lelbach mentioned in one of his talks that joined/flattened iteration was left out because of lacking performance or compilers inability to optimize it. I think this should be revised for C++26, because with cspan, joined iteration is only 50% slower than raw nested loops with clang 16.0.6 on my hardware. In this case, I iterate two 3D sliced spans and add elements of one to the other. With gcc 13.2 it is less than 2X slower. These loops are already incredible fast, so 2X is still very fast and useful. The benchmark is here.
|
@tylov Thanks for your kind words about mdspan! It was a long process. Usually when C++ developers think of "iteration," they think of iterators (or C++20 ranges, which are iterator-like). Those are things that look like pointers, so that Iterators iterate over the range of the mdspan. The domain of the mdspan is its It's rare that I see code that iterates over the elements of a single mdspan. Usually, code either accesses multiple mdspan with the same domain, or is a "stencil loop" (that reads elements other than the current element, and modifies the current element). In both those cases, iterating over the extents is more helpful than iterating over the elements. What's currently missing in C++ is parallel ranges. The C++17 parallel algorithms (e.g., Bryce actually wrote a paper on multidimensional iteration a few years ago. He found that flattening via iterators didn't work so well, because multidimensional iterators are stateful. Coroutines actually worked better. I've written a bit on implementing generic mdspan iteration in this reference mdspan issue, and written a code example in this pull request. |
This is already pretty far off-topic but quick plug for a library I wrote called MArray. It is very much like mdspan but supports owning and non-owning (view) containers, sub-slicing, iteration, both fixed and variable-dimension arrays, expression templates, etc. |
Thanks guys, very informative. I'll take look at theses. And yes, we are off-topic, so you may close the issue/discussion. |
May I suggest this discussion be continued on the BLIS discord server? |
This is more a question whether there is interest for a small, fast and generic C99 implementation of a numpy-alike multidimensional array-view API (currently header file ~200 LOC)? I am not currently a BLIS user, but I gathered it may have some interest for C users of BLIS. It does not currently compile with C++ (can be done), but C is the primary target. It supports
Note that C++23 std::mdspan will not support the two last bullets at all (maybe in C++26).
The API could possibly be adapted at some level as a "contribution API", for making it more convenient to use BLIS directly from C, by extending it to wrap BLIS-function calls, as shown in the example below. I am not aware of any similar C library with these specific features, ergonomics, and small package. I made a recent reddit post here. The example below is a rewrite of the example in the c++ std::mdspan proposal
The text was updated successfully, but these errors were encountered: