This is an archive of the discontinued LLVM Phabricator instance.

[CUDA][HIP] Fix host/device check with -fopenmp
ClosedPublic

Authored by yaxunl on Sep 20 2019, 8:36 AM.

Details

Summary

CUDA/HIP program may be compiled with -fopenmp. In this case, -fopenmp is only passed to host compilation
to take advantages of multi-threads computation.

CUDA/HIP and OpenMP both use Sema::DeviceCallGraph to store functions to be analyzed and remove them
once they decide the function is sure to be emitted. CUDA/HIP and OpenMP have different functions to determine
if a function is sure to be emitted.

To check host/device correctly for CUDA/HIP when -fopenmp is enabled, there needs a unified logic to determine
whether a function is to be emitted. The logic needs to be aware of both CUDA and OpenMP logic.

This patch only affects CUDA/HIP program which are compiled with -fopenmp. It should have no effect on C++ programs
compiled with -fopenmp.

Diff Detail

Event Timeline

yaxunl created this revision.Sep 20 2019, 8:36 AM
Herald added a project: Restricted Project. · View Herald Transcript
Herald added a subscriber: guansong. · View Herald Transcript
yaxunl edited the summary of this revision. (Show Details)Sep 20 2019, 8:46 AM
rjmccall added inline comments.Sep 20 2019, 9:49 AM
lib/Sema/Sema.cpp
1510

There's an overload of DenseMap::erase that just takes a key value, so this whole thing can be S.DeviceCallGraph.erase(OrigCallee);.

Why do we need to erase the entry instead of re-using it? If the call graphs are different for the two use-cases, is that conflict a problem for other reasons?

yaxunl updated this revision to Diff 221097.Sep 20 2019, 12:59 PM

reuse the call tree.

yaxunl marked an inline comment as done.Sep 20 2019, 1:00 PM
yaxunl added inline comments.
lib/Sema/Sema.cpp
1510

I think you are right. we should reuse the call graph.

rjmccall added inline comments.Sep 20 2019, 4:00 PM
lib/Sema/Sema.cpp
1510

Okay. I believe DenseMap has a count that tells you how many entries there are for a key.

yaxunl updated this revision to Diff 221341.Sep 23 2019, 8:27 AM

revised by John's comments

Okay. And it's okay to fall down to the code below when functions are used in both ways this way?

Okay. And it's okay to fall down to the code below when functions are used in both ways this way?

This part of code is for delayed checking of hostness. If a host function calls device function, we do not want to diagnose it unless we are sure the host func is to be emitted.

whether a func is emitted is determined by IsKnownEmitted function. HIP and OpenMP each has its own copy.

In this case, HIP thinks a static host func is not sure to be emitted, unless it has an external linkage.

However, OpenMP thinks a static host func is always emitted.

If we do not consolidate IsKnownEmitted function of HIP and OpenMP, and use the current fix, for HIP, some additional diagnostics will be emitted compared to without -fopenmp in certain cases.

It seems I should figure out whether the static host func is really emitted for HIP then consolidate IsKnownEmitted func for HIP and OpenMP.

If that's tractable, then that does seem like the best solution. You can commit this if you need a shorter-term fix.

yaxunl updated this revision to Diff 221829.Sep 25 2019, 1:45 PM
yaxunl edited the summary of this revision. (Show Details)

Unify CUDA/HIP/OpenMP host/device check.

yaxunl retitled this revision from [CUDA][HIP] Fix assertion in Sema::markKnownEmitted with -fopenmp to [CUDA][HIP] Fix host/device check with -fopenmp.Sep 25 2019, 1:51 PM
yaxunl edited the summary of this revision. (Show Details)

That looks great. I'm probably not the best person to validate that the OMP / CUDA logic is still right, but it seems like the right technical design.

ping
@ABataev Any comments? Thanks.

ABataev added inline comments.Oct 1 2019, 6:28 AM
lib/Sema/SemaDecl.cpp
17618 ↗(On Diff #221829)

Are you going to handle #pragma omp declare target device_type(nohost) in Cuda mode correctly? Such functions should not be emitted on the host in OpenMP 5.0.

lib/Sema/SemaOpenMP.cpp
1569–1574 ↗(On Diff #221829)

You can remove the whole function and use Sema::getEmissionStatus() instead.

1593 ↗(On Diff #221829)

I think, CUDADiscarded case must be unreachable and it must be a case for llvm_unreachable() in case of the OpenMP device code.

1602–1607 ↗(On Diff #221829)

You can remove the whole function and use Sema::getEmissionStatus() instead.

1647–1648 ↗(On Diff #221829)

I would add an assert thet this function does not return CUDADiscarded value.

1684–1685 ↗(On Diff #221829)

Also, it would be good to check that it cannot return CUDADiscarded here.

yaxunl marked 12 inline comments as done.Oct 4 2019, 2:01 PM
yaxunl added inline comments.
lib/Sema/SemaDecl.cpp
17618 ↗(On Diff #221829)

will do

lib/Sema/SemaOpenMP.cpp
1569–1574 ↗(On Diff #221829)

done

1593 ↗(On Diff #221829)

done

1602–1607 ↗(On Diff #221829)

done

1647–1648 ↗(On Diff #221829)

done

1684–1685 ↗(On Diff #221829)

In host compilation, if the source code is CUDA/HIP language, a CUDA/HIP device function may be discarded as CUDADiscarded and show up here. For C++, CUDADiscarded should not be returned here.

yaxunl updated this revision to Diff 223284.Oct 4 2019, 2:04 PM
yaxunl marked 6 inline comments as done.

Revised by Alexey's comments.

ABataev added inline comments.Oct 7 2019, 7:01 AM
lib/Sema/SemaCUDA.cpp
604 ↗(On Diff #223284)

I believe this function can be removed

lib/Sema/SemaDecl.cpp
17619 ↗(On Diff #223284)

Better to use real type here, not auto

17623–17626 ↗(On Diff #223284)

Enclose in braces

17626 ↗(On Diff #223284)

Hmm, it must be marked as know-emitted only if S.DeviceKnownEmittedFns.count(FD) > 0. Otherwise, it must be unknown.

17629–17631 ↗(On Diff #223284)

Enclose in braces, not goo to have else branch enclosed in braces and then branch without.

17641 ↗(On Diff #223284)

Same here, it must be marked as know-emitted only if S.DeviceKnownEmittedFns.count(FD) > 0. Otherwise, it must be unknown.

17684–17689 ↗(On Diff #223284)

Reformat the comment here

lib/Sema/SemaOpenMP.cpp
1629–1630 ↗(On Diff #223284)

The same condition is checked twice, one of them must be for CalleeS, I believe

1674 ↗(On Diff #223284)

Again, the same condition checked twice.

yaxunl updated this revision to Diff 223712.Oct 7 2019, 6:13 PM
yaxunl marked 14 inline comments as done.

revised by Alexey's comments.

ABataev added inline comments.Oct 8 2019, 6:31 AM
lib/Sema/SemaCUDA.cpp
616 ↗(On Diff #223712)

I assume, better to use cast here, not dyn_cast

645 ↗(On Diff #223712)

Same here, just cast

lib/Sema/SemaOpenMP.cpp
1627–1628 ↗(On Diff #223712)

Better to use real type instead of auto here

1670–1671 ↗(On Diff #223712)

Real types instead of auto

yaxunl updated this revision to Diff 224126.Oct 9 2019, 12:25 PM
yaxunl marked 8 inline comments as done.

revised by Alexey's comments.

lib/Sema/SemaCUDA.cpp
616 ↗(On Diff #223712)

done

645 ↗(On Diff #223712)

done

604 ↗(On Diff #223284)

done

lib/Sema/SemaDecl.cpp
17619 ↗(On Diff #223284)

done

17623–17626 ↗(On Diff #223284)

done

17626 ↗(On Diff #223284)

done

17629–17631 ↗(On Diff #223284)

done

17641 ↗(On Diff #223284)

done

17684–17689 ↗(On Diff #223284)

done

lib/Sema/SemaOpenMP.cpp
1627–1628 ↗(On Diff #223712)

done

1670–1671 ↗(On Diff #223712)

done

1629–1630 ↗(On Diff #223284)

done

1674 ↗(On Diff #223284)

done

This revision is now accepted and ready to land.Oct 9 2019, 12:30 PM
This revision was automatically updated to reflect the committed changes.