This is an archive of the discontinued LLVM Phabricator instance.

[CUDA][HIP] Fix host/device check with -fopenmp
ClosedPublic

Authored by yaxunl on Sep 20 2019, 8:36 AM.

Details

Summary

CUDA/HIP program may be compiled with -fopenmp. In this case, -fopenmp is only passed to host compilation
to take advantages of multi-threads computation.

CUDA/HIP and OpenMP both use Sema::DeviceCallGraph to store functions to be analyzed and remove them
once they decide the function is sure to be emitted. CUDA/HIP and OpenMP have different functions to determine
if a function is sure to be emitted.

To check host/device correctly for CUDA/HIP when -fopenmp is enabled, there needs a unified logic to determine
whether a function is to be emitted. The logic needs to be aware of both CUDA and OpenMP logic.

This patch only affects CUDA/HIP program which are compiled with -fopenmp. It should have no effect on C++ programs
compiled with -fopenmp.

Diff Detail

Event Timeline

yaxunl created this revision.Sep 20 2019, 8:36 AM
Herald added a project: Restricted Project. · View Herald Transcript
Herald added a subscriber: guansong. · View Herald Transcript
yaxunl edited the summary of this revision. (Show Details)Sep 20 2019, 8:46 AM
rjmccall added inline comments.Sep 20 2019, 9:49 AM
lib/Sema/Sema.cpp
1511 ↗(On Diff #221040)

There's an overload of DenseMap::erase that just takes a key value, so this whole thing can be S.DeviceCallGraph.erase(OrigCallee);.

Why do we need to erase the entry instead of re-using it? If the call graphs are different for the two use-cases, is that conflict a problem for other reasons?

yaxunl updated this revision to Diff 221097.Sep 20 2019, 12:59 PM

reuse the call tree.

yaxunl marked an inline comment as done.Sep 20 2019, 1:00 PM
yaxunl added inline comments.
lib/Sema/Sema.cpp
1511 ↗(On Diff #221040)

I think you are right. we should reuse the call graph.

rjmccall added inline comments.Sep 20 2019, 4:00 PM
lib/Sema/Sema.cpp
1511 ↗(On Diff #221040)

Okay. I believe DenseMap has a count that tells you how many entries there are for a key.

yaxunl updated this revision to Diff 221341.Sep 23 2019, 8:27 AM

revised by John's comments

Okay. And it's okay to fall down to the code below when functions are used in both ways this way?

Okay. And it's okay to fall down to the code below when functions are used in both ways this way?

This part of code is for delayed checking of hostness. If a host function calls device function, we do not want to diagnose it unless we are sure the host func is to be emitted.

whether a func is emitted is determined by IsKnownEmitted function. HIP and OpenMP each has its own copy.

In this case, HIP thinks a static host func is not sure to be emitted, unless it has an external linkage.

However, OpenMP thinks a static host func is always emitted.

If we do not consolidate IsKnownEmitted function of HIP and OpenMP, and use the current fix, for HIP, some additional diagnostics will be emitted compared to without -fopenmp in certain cases.

It seems I should figure out whether the static host func is really emitted for HIP then consolidate IsKnownEmitted func for HIP and OpenMP.

If that's tractable, then that does seem like the best solution. You can commit this if you need a shorter-term fix.

yaxunl updated this revision to Diff 221829.Sep 25 2019, 1:45 PM
yaxunl edited the summary of this revision. (Show Details)

Unify CUDA/HIP/OpenMP host/device check.

yaxunl retitled this revision from [CUDA][HIP] Fix assertion in Sema::markKnownEmitted with -fopenmp to [CUDA][HIP] Fix host/device check with -fopenmp.Sep 25 2019, 1:51 PM
yaxunl edited the summary of this revision. (Show Details)

That looks great. I'm probably not the best person to validate that the OMP / CUDA logic is still right, but it seems like the right technical design.

ping
@ABataev Any comments? Thanks.

ABataev added inline comments.Oct 1 2019, 6:28 AM
lib/Sema/SemaDecl.cpp
17618

Are you going to handle #pragma omp declare target device_type(nohost) in Cuda mode correctly? Such functions should not be emitted on the host in OpenMP 5.0.

lib/Sema/SemaOpenMP.cpp
1569–1574

You can remove the whole function and use Sema::getEmissionStatus() instead.

1593

I think, CUDADiscarded case must be unreachable and it must be a case for llvm_unreachable() in case of the OpenMP device code.

1602–1607

You can remove the whole function and use Sema::getEmissionStatus() instead.

1647–1648

I would add an assert thet this function does not return CUDADiscarded value.

1684–1685

Also, it would be good to check that it cannot return CUDADiscarded here.

yaxunl marked 12 inline comments as done.Oct 4 2019, 2:01 PM
yaxunl added inline comments.
lib/Sema/SemaDecl.cpp
17618

will do

lib/Sema/SemaOpenMP.cpp
1569–1574

done

1593

done

1602–1607

done

1647–1648

done

1684–1685

In host compilation, if the source code is CUDA/HIP language, a CUDA/HIP device function may be discarded as CUDADiscarded and show up here. For C++, CUDADiscarded should not be returned here.

yaxunl updated this revision to Diff 223284.Oct 4 2019, 2:04 PM
yaxunl marked 6 inline comments as done.

Revised by Alexey's comments.

ABataev added inline comments.Oct 7 2019, 7:01 AM
lib/Sema/SemaCUDA.cpp
605–606

I believe this function can be removed

lib/Sema/SemaDecl.cpp
17619

Better to use real type here, not auto

17623–17626

Enclose in braces

17626

Hmm, it must be marked as know-emitted only if S.DeviceKnownEmittedFns.count(FD) > 0. Otherwise, it must be unknown.

17629–17631

Enclose in braces, not goo to have else branch enclosed in braces and then branch without.

17641

Same here, it must be marked as know-emitted only if S.DeviceKnownEmittedFns.count(FD) > 0. Otherwise, it must be unknown.

17684–17689

Reformat the comment here

lib/Sema/SemaOpenMP.cpp
1658–1659

The same condition is checked twice, one of them must be for CalleeS, I believe

1697

Again, the same condition checked twice.

yaxunl updated this revision to Diff 223712.Oct 7 2019, 6:13 PM
yaxunl marked 14 inline comments as done.

revised by Alexey's comments.

ABataev added inline comments.Oct 8 2019, 6:31 AM
lib/Sema/SemaCUDA.cpp
622

I assume, better to use cast here, not dyn_cast

650

Same here, just cast

lib/Sema/SemaOpenMP.cpp
1656–1657

Better to use real type instead of auto here

1693–1694

Real types instead of auto

yaxunl updated this revision to Diff 224126.Oct 9 2019, 12:25 PM
yaxunl marked 8 inline comments as done.

revised by Alexey's comments.

lib/Sema/SemaCUDA.cpp
605–606

done

622

done

650

done

lib/Sema/SemaDecl.cpp
17619

done

17623–17626

done

17626

done

17629–17631

done

17641

done

17684–17689

done

lib/Sema/SemaOpenMP.cpp
1656–1657

done

1658–1659

done

1693–1694

done

1697

done

This revision is now accepted and ready to land.Oct 9 2019, 12:30 PM
This revision was automatically updated to reflect the committed changes.