Since D83271 we can optimize the GPU state machine to avoid spurious
call edges that increase the register usage of kernels. With this patch
we inform the user why and if this optimization is happening and when it
is not.
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Unit Tests
Event Timeline
| clang/test/OpenMP/remarks_parallel_in_target_state_machine.c | ||
|---|---|---|
| 13 | Add a space "machineomp_outlined2_wrapper" to "machine omp_outlined2_wrapper" | |
clang-format: please reformat the code
-#pragma omp parallel // #1 - // expected-remark@#1 {{Found a parallel region that is called in a target region but not part of a combined target construct nor nesed inside a target construct without intermediate code. This can lead to excessive register usage in unrelated kernels in the same translation unit due to spurious call edges assumed by ptxas.}} +#pragma omp parallel // #1 \ + // expected-remark@#1 {{Found a parallel region that is called in a target region but not part of a combined target construct nor nesed inside a target construct without intermediate code. This can lead to excessive register usage in unrelated kernels in the same translation unit due to spurious call edges assumed by ptxas.}} \