Download Raw Diff

Details

Reviewers

tra
rjmccall
jdoerfert
ABataev

Commits

rL374263: [CUDA][HIP] Fix host/device check with -fopenmp
rG229c78d3a5d1: [CUDA][HIP] Fix host/device check with -fopenmp
rC374263: [CUDA][HIP] Fix host/device check with -fopenmp

Summary

CUDA/HIP program may be compiled with -fopenmp. In this case, -fopenmp is only passed to host compilation
to take advantages of multi-threads computation.

CUDA/HIP and OpenMP both use Sema::DeviceCallGraph to store functions to be analyzed and remove them
once they decide the function is sure to be emitted. CUDA/HIP and OpenMP have different functions to determine
if a function is sure to be emitted.

To check host/device correctly for CUDA/HIP when -fopenmp is enabled, there needs a unified logic to determine
whether a function is to be emitted. The logic needs to be aware of both CUDA and OpenMP logic.

This patch only affects CUDA/HIP program which are compiled with -fopenmp. It should have no effect on C++ programs
compiled with -fopenmp.

Diff Detail

Event Timeline

yaxunl created this revision.Sep 20 2019, 8:36 AM

Herald added a reviewer: jdoerfert. · View Herald TranscriptSep 20 2019, 8:36 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added a subscriber: guansong. · View Herald Transcript

yaxunl edited the summary of this revision. (Show Details)Sep 20 2019, 8:46 AM

rjmccall added inline comments.Sep 20 2019, 9:49 AM

lib/Sema/Sema.cpp
1510	There's an overload of `DenseMap::erase` that just takes a key value, so this whole thing can be `S.DeviceCallGraph.erase(OrigCallee);`. Why do we need to erase the entry instead of re-using it? If the call graphs are different for the two use-cases, is that conflict a problem for other reasons?

reuse the call tree.

yaxunl marked an inline comment as done.Sep 20 2019, 1:00 PM

yaxunl added inline comments.

lib/Sema/Sema.cpp
1510	I think you are right. we should reuse the call graph.

rjmccall added inline comments.Sep 20 2019, 4:00 PM

lib/Sema/Sema.cpp
1510	Okay. I believe DenseMap has a `count` that tells you how many entries there are for a key.

revised by John's comments

Okay. And it's okay to fall down to the code below when functions are used in both ways this way?

In D67837#1679670, @rjmccall wrote:

Okay. And it's okay to fall down to the code below when functions are used in both ways this way?

This part of code is for delayed checking of hostness. If a host function calls device function, we do not want to diagnose it unless we are sure the host func is to be emitted.

whether a func is emitted is determined by IsKnownEmitted function. HIP and OpenMP each has its own copy.

In this case, HIP thinks a static host func is not sure to be emitted, unless it has an external linkage.

However, OpenMP thinks a static host func is always emitted.

If we do not consolidate IsKnownEmitted function of HIP and OpenMP, and use the current fix, for HIP, some additional diagnostics will be emitted compared to without -fopenmp in certain cases.

It seems I should figure out whether the static host func is really emitted for HIP then consolidate IsKnownEmitted func for HIP and OpenMP.

If that's tractable, then that does seem like the best solution. You can commit this if you need a shorter-term fix.

Unify CUDA/HIP/OpenMP host/device check.

yaxunl retitled this revision from [CUDA][HIP] Fix assertion in Sema::markKnownEmitted with -fopenmp to [CUDA][HIP] Fix host/device check with -fopenmp.Sep 25 2019, 1:51 PM

yaxunl edited the summary of this revision. (Show Details)

That looks great. I'm probably not the best person to validate that the OMP / CUDA logic is still right, but it seems like the right technical design.

yaxunl added a reviewer: ABataev.Sep 25 2019, 4:44 PM

ping
@ABataev Any comments? Thanks.

ABataev added inline comments.Oct 1 2019, 6:28 AM

lib/Sema/SemaDecl.cpp
17618 ↗	(On Diff #221829)	Are you going to handle `#pragma omp declare target device_type(nohost)` in Cuda mode correctly? Such functions should not be emitted on the host in OpenMP 5.0.
lib/Sema/SemaOpenMP.cpp
1569–1574 ↗	(On Diff #221829)	You can remove the whole function and use `Sema::getEmissionStatus()` instead.
1593 ↗	(On Diff #221829)	I think, `CUDADiscarded` case must be unreachable and it must be a case for `llvm_unreachable()` in case of the OpenMP device code.
1602–1607 ↗	(On Diff #221829)	You can remove the whole function and use `Sema::getEmissionStatus()` instead.
1647–1648 ↗	(On Diff #221829)	I would add an assert thet this function does not return `CUDADiscarded` value.
1684–1685 ↗	(On Diff #221829)	Also, it would be good to check that it cannot return `CUDADiscarded` here.

yaxunl marked 12 inline comments as done.Oct 4 2019, 2:01 PM

yaxunl added inline comments.

lib/Sema/SemaDecl.cpp
17618 ↗	(On Diff #221829)	will do
lib/Sema/SemaOpenMP.cpp
1569–1574 ↗	(On Diff #221829)	done
1593 ↗	(On Diff #221829)	done
1602–1607 ↗	(On Diff #221829)	done
1647–1648 ↗	(On Diff #221829)	done
1684–1685 ↗	(On Diff #221829)	In host compilation, if the source code is CUDA/HIP language, a CUDA/HIP device function may be discarded as CUDADiscarded and show up here. For C++, CUDADiscarded should not be returned here.

Revised by Alexey's comments.

ABataev added inline comments.Oct 7 2019, 7:01 AM

lib/Sema/SemaCUDA.cpp
604 ↗	(On Diff #223284)	I believe this function can be removed
lib/Sema/SemaDecl.cpp
17619 ↗	(On Diff #223284)	Better to use real type here, not `auto`
17623–17626 ↗	(On Diff #223284)	Enclose in braces
17626 ↗	(On Diff #223284)	Hmm, it must be marked as know-emitted only if `S.DeviceKnownEmittedFns.count(FD) > 0`. Otherwise, it must be unknown.
17629–17631 ↗	(On Diff #223284)	Enclose in braces, not goo to have `else` branch enclosed in braces and `then` branch without.
17641 ↗	(On Diff #223284)	Same here, it must be marked as know-emitted only if `S.DeviceKnownEmittedFns.count(FD) > 0`. Otherwise, it must be unknown.
17684–17689 ↗	(On Diff #223284)	Reformat the comment here
lib/Sema/SemaOpenMP.cpp
1629–1630 ↗	(On Diff #223284)	The same condition is checked twice, one of them must be for `CalleeS`, I believe
1674 ↗	(On Diff #223284)	Again, the same condition checked twice.

revised by Alexey's comments.

ABataev added inline comments.Oct 8 2019, 6:31 AM

lib/Sema/SemaCUDA.cpp
616 ↗	(On Diff #223712)	I assume, better to use `cast` here, not `dyn_cast`
645 ↗	(On Diff #223712)	Same here, just `cast`
lib/Sema/SemaOpenMP.cpp
1627–1628 ↗	(On Diff #223712)	Better to use real type instead of `auto` here
1670–1671 ↗	(On Diff #223712)	Real types instead of `auto`

revised by Alexey's comments.

lib/Sema/SemaCUDA.cpp
616 ↗	(On Diff #223712)	done
645 ↗	(On Diff #223712)	done
604 ↗	(On Diff #223284)	done
lib/Sema/SemaDecl.cpp
17619 ↗	(On Diff #223284)	done
17623–17626 ↗	(On Diff #223284)	done
17626 ↗	(On Diff #223284)	done
17629–17631 ↗	(On Diff #223284)	done
17641 ↗	(On Diff #223284)	done
17684–17689 ↗	(On Diff #223284)	done
lib/Sema/SemaOpenMP.cpp
1627–1628 ↗	(On Diff #223712)	done
1670–1671 ↗	(On Diff #223712)	done
1629–1630 ↗	(On Diff #223284)	done
1674 ↗	(On Diff #223284)	done

This revision is now accepted and ready to land.Oct 9 2019, 12:30 PM

Closed by commit rG229c78d3a5d1: [CUDA][HIP] Fix host/device check with -fopenmp (authored by yaxunl). · Explain WhyOct 9 2019, 4:58 PM

This revision was automatically updated to reflect the committed changes.

This is an archive of the discontinued LLVM Phabricator instance.

[CUDA][HIP] Fix host/device check with -fopenmp
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 221341

lib/Sema/Sema.cpp

test/SemaCUDA/openmp-static-func.cu

This is an archive of the discontinued LLVM Phabricator instance.

[CUDA][HIP] Fix host/device check with -fopenmpClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 221341

lib/Sema/Sema.cpp

test/SemaCUDA/openmp-static-func.cu

[CUDA][HIP] Fix host/device check with -fopenmp
ClosedPublic