Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loops can be readily condensed via fypp #657

Open
sbryngelson opened this issue Oct 19, 2024 · 2 comments
Open

Loops can be readily condensed via fypp #657

sbryngelson opened this issue Oct 19, 2024 · 2 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@sbryngelson
Copy link
Member

sbryngelson commented Oct 19, 2024

do concurrent is usually used to invoke a standard language-level parallelism, including GPU offloading. But, if the flag for it is not set, then it doesn't do much other than, perhaps, some multithreading.

It does not seem to clash with OpenACC in my experimentation so far (https://fortran-lang.discourse.group/t/how-does-openacc-collapse-interact-with-do-concurrent/6887).

With it, we can do this:

!$acc parallel loop collapse(4) gang vector default(present)
do concurrent (j = 1:sys_size, q = 0:p, l = 0:n, k = 0:m)
	rhs_vf(j)%sf(k, l, q) = 1d0/dx(k)* &
		(flux_n(1)%vf(j)%sf(k - 1, l, q) &
		 - flux_n(1)%vf(j)%sf(k, l, q))
end do

instead of this

!$acc parallel loop collapse(4) gang vector default(present)
do j = 1, sys_size
    do q = 0, p
        do l = 0, n
            do k = 0, m
                rhs_vf(j)%sf(k, l, q) = 1d0/dx(k)* &
                                        (flux_n(1)%vf(j)%sf(k - 1, l, q) &
                                         - flux_n(1)%vf(j)%sf(k, l, q))
            end do
        end do
    end do
end do

I think we can still pull out a sequential loop as needed, like this:

!$acc parallel loop collapse(3) gang vector default(present)
do concurrent (j = 1:sys_size, q = 0:p, l = 0:n)
	!$acc parallel seq
    do k = 0,m
		rhs_vf(j)%sf(k, l, q) = 1d0/dx(k)* &
			(flux_n(1)%vf(j)%sf(k - 1, l, q) &
			 - flux_n(1)%vf(j)%sf(k, l, q))
	end do
end do

While not an actual code improvement per se, it does seem quite helpful for readability. We go from 8 lines of code for a loop to 2.

@sbryngelson sbryngelson added enhancement New feature or request good first issue Good for newcomers labels Oct 19, 2024
@sbryngelson
Copy link
Member Author

sbryngelson commented Oct 20, 2024

This works with NVHPC, but not CCE compilers, in the GPU case (error is something like "collapse requires perfectly nested do loops") [FYI @abbotts ].

I reproduced it on a minimal example.

@sbryngelson sbryngelson changed the title Loops can be readily condensed via do concurrent Loops can be readily condensed via fypp Oct 22, 2024
@sbryngelson
Copy link
Member Author

sbryngelson commented Oct 22, 2024

@henryleberre created this that does the trick:

#:def forall(*args)
#:for loop in args[:-1]
do ${loop}$
#:endfor
$:args[-1]
#:for _ in range(len(args)-1)
end do
#:endfor
#:enddef

program forall_example
  implicit none
  integer :: n = 2
  integer :: m = 3
  integer :: i, j
  integer , dimension(1:2,1:2) :: x

  x(1,1) = 0
  x(1,2) = n

  x(2,1) = 1
  x(2,2) = m

  #:call forall('i=x(1,1),x(1,2)', 'j=x(2,1),x(2,2)')
    print*, i, j 
  #:endcall

end program forall_example

the created code is

program forall_example
  implicit none
  integer :: n = 2
  integer :: m = 3
  integer :: i, j
  integer , dimension(1:2,1:2) :: x

  x(1,1) = 0
  x(1,2) = n

  x(2,1) = 1
  x(2,2) = m

do i=x(1,1),x(1,2)
do j=x(2,1),x(2,2)
    print*, i, j 
end do
end do

end program forall_example

@sbryngelson sbryngelson reopened this Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Development

No branches or pull requests

1 participant