Since D83271 we can optimize the GPU state machine to avoid spurious
call edges that increase the register usage of kernels. With this patch
we inform the user why and if this optimization is happening and when it
is not.
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Unit Tests
Event Timeline
| clang/test/OpenMP/remarks_parallel_in_target_state_machine.c | ||
|---|---|---|
| 12 | Add a space "machineomp_outlined2_wrapper" to "machine omp_outlined2_wrapper" | |
clang-format: please reformat the code
- #pragma omp parallel // #1 - // expected-remark@#1 {{Found parallel region that is called through a state machine__omp_outlined__2_wrapper in non-SPMD target region. This can lead to excessive register usage in unrelated kernels in the same translation unit due to spurious call edges assumed by ptxas.}} - // expected-remark@#1 {{Parallel region __omp_outlined__2_wrapper is not known to be called from a single target region only, maybe the surrounding function has external linkage?; will not attempt to rewrite the state machine use.}} - { - } +#pragma omp parallel // #1 + // expected-remark@#1 {{Found parallel region that is called through a state machine__omp_outlined__2_wrapper in non-SPMD target region. This can lead to excessive register usage in unrelated kernels in the same translation unit due to spurious call edges assumed by ptxas.}} + // expected-remark@#1 {{Parallel region __omp_outlined__2_wrapper is not known to be called from a single target region only, maybe the surrounding function has external linkage?; will not attempt to rewrite the state machine use.}} + { + }