Page MenuHomePhabricator

[NVPTX] Unforce minimum alignment of 4 for byval arguments of device-side functions.
Needs ReviewPublic

Authored by pavelkopyl on Jan 13 2023, 4:48 PM.

Details

Reviewers
tra
Summary

Minimum alignment of 4 for byval arguments was forced to workaround
a bug in old versions of ptxas. Details: https://reviews.llvm.org/D22428.
Recent ptxas versions (> 9.0) do not seem to have this bug, so alignment
requirement was relaxed. To force again minimum alignment of 4, use
'-force-min-byval-param-align' option.

Diff Detail

Unit TestsFailed

TimeTest
60,210 msx64 debian > Clang.CodeGen/RISCV/rvv-intrinsics-autogenerated/policy/non-overloaded::vloxseg.c
Script: -- : 'RUN: at line 3'; /var/lib/buildkite-agent/builds/llvm-project/build/bin/clang -cc1 -internal-isystem /var/lib/buildkite-agent/builds/llvm-project/build/lib/clang/16/include -nostdsysteminc -triple riscv64 -target-feature +v -target-feature +zfh -target-feature +experimental-zvfh -disable-O0-optnone -emit-llvm /var/lib/buildkite-agent/builds/llvm-project/clang/test/CodeGen/RISCV/rvv-intrinsics-autogenerated/policy/non-overloaded/vloxseg.c -o - | /var/lib/buildkite-agent/builds/llvm-project/build/bin/opt -S -passes=mem2reg | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck --check-prefix=CHECK-RV64 /var/lib/buildkite-agent/builds/llvm-project/clang/test/CodeGen/RISCV/rvv-intrinsics-autogenerated/policy/non-overloaded/vloxseg.c
60,230 msx64 debian > Clang.CodeGen/RISCV/rvv-intrinsics-autogenerated/policy/non-overloaded::vluxseg.c
Script: -- : 'RUN: at line 3'; /var/lib/buildkite-agent/builds/llvm-project/build/bin/clang -cc1 -internal-isystem /var/lib/buildkite-agent/builds/llvm-project/build/lib/clang/16/include -nostdsysteminc -triple riscv64 -target-feature +v -target-feature +zfh -target-feature +experimental-zvfh -disable-O0-optnone -emit-llvm /var/lib/buildkite-agent/builds/llvm-project/clang/test/CodeGen/RISCV/rvv-intrinsics-autogenerated/policy/non-overloaded/vluxseg.c -o - | /var/lib/buildkite-agent/builds/llvm-project/build/bin/opt -S -passes=mem2reg | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck --check-prefix=CHECK-RV64 /var/lib/buildkite-agent/builds/llvm-project/clang/test/CodeGen/RISCV/rvv-intrinsics-autogenerated/policy/non-overloaded/vluxseg.c
60,240 msx64 debian > Clang.CodeGen/RISCV/rvv-intrinsics-autogenerated/policy/overloaded::vloxseg.c
Script: -- : 'RUN: at line 3'; /var/lib/buildkite-agent/builds/llvm-project/build/bin/clang -cc1 -internal-isystem /var/lib/buildkite-agent/builds/llvm-project/build/lib/clang/16/include -nostdsysteminc -triple riscv64 -target-feature +v -target-feature +zfh -target-feature +experimental-zvfh -disable-O0-optnone -emit-llvm /var/lib/buildkite-agent/builds/llvm-project/clang/test/CodeGen/RISCV/rvv-intrinsics-autogenerated/policy/overloaded/vloxseg.c -o - | /var/lib/buildkite-agent/builds/llvm-project/build/bin/opt -S -passes=mem2reg | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck --check-prefix=CHECK-RV64 /var/lib/buildkite-agent/builds/llvm-project/clang/test/CodeGen/RISCV/rvv-intrinsics-autogenerated/policy/overloaded/vloxseg.c
60,260 msx64 debian > Clang.CodeGen/RISCV/rvv-intrinsics-autogenerated/policy/overloaded::vluxseg.c
Script: -- : 'RUN: at line 3'; /var/lib/buildkite-agent/builds/llvm-project/build/bin/clang -cc1 -internal-isystem /var/lib/buildkite-agent/builds/llvm-project/build/lib/clang/16/include -nostdsysteminc -triple riscv64 -target-feature +v -target-feature +zfh -target-feature +experimental-zvfh -disable-O0-optnone -emit-llvm /var/lib/buildkite-agent/builds/llvm-project/clang/test/CodeGen/RISCV/rvv-intrinsics-autogenerated/policy/overloaded/vluxseg.c -o - | /var/lib/buildkite-agent/builds/llvm-project/build/bin/opt -S -passes=mem2reg | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck --check-prefix=CHECK-RV64 /var/lib/buildkite-agent/builds/llvm-project/clang/test/CodeGen/RISCV/rvv-intrinsics-autogenerated/policy/overloaded/vluxseg.c

Event Timeline

pavelkopyl created this revision.Jan 13 2023, 4:48 PM
pavelkopyl requested review of this revision.Jan 13 2023, 4:48 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 13 2023, 4:48 PM
tra added a comment.Jan 13 2023, 5:04 PM

Do we need to do it? I mean -- are there any observable differences between enforced alignment of 4 and not? For register-passed parameters it will make no difference, and for the parameters passed via constant/local memory, it may or may not make a difference.

I guess at the extreme it would impact how many 1-byte parameters we may have, as the total size of param address space has an upper bound (for kernels it's 4K: https://docs.nvidia.com/cuda/cuda-c-programming-guide/#parameter-buffer-layout). If that's the case, then it probably does not matter whether we can pass 4K arguments or only 1K.

In other words, I do not see a practical need to introduce an option which may not have any practical use.

pavelkopyl edited the summary of this revision. (Show Details)Jan 25 2023, 4:56 PM

Do we need to do it? I mean -- are there any observable differences between enforced alignment of 4 and not? For register-passed parameters it will make no difference, and for the parameters passed via constant/local memory, it may or may not make a difference.

I guess at the extreme it would impact how many 1-byte parameters we may have, as the total size of param address space has an upper bound (for kernels it's 4K: https://docs.nvidia.com/cuda/cuda-c-programming-guide/#parameter-buffer-layout). If that's the case, then it probably does not matter whether we can pass 4K arguments or only 1K.

In other words, I do not see a practical need to introduce an option which may not have any practical use.

Sorry for long response.
OK, I'll remove the option. I just wanted to clarify, are you agree in general with removing this workaround?

pavelkopyl added a comment.EditedFeb 2 2023, 3:53 AM

I guess, there is one more reason to getting back to default alignment.It seems we break ABI with alignment forced to 4. This may be an issue for mixed build clang + nvcc.