This is an archive of the discontinued LLVM Phabricator instance.

[NVPTX] Unforce minimum alignment of 4 for byval arguments of device-side functions.
AcceptedPublic

Authored by pavelkopyl on Jan 13 2023, 4:48 PM.

Details

Reviewers
tra
Summary

Minimum alignment of 4 for byval arguments was forced to workaround
a bug in old versions of ptxas. Details: https://reviews.llvm.org/D22428.
Recent ptxas versions (> 9.0) do not seem to have this bug, so alignment
requirement was relaxed. To force again minimum alignment of 4, use
'-force-min-byval-param-align' option.

Diff Detail

Event Timeline

pavelkopyl created this revision.Jan 13 2023, 4:48 PM
pavelkopyl requested review of this revision.Jan 13 2023, 4:48 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 13 2023, 4:48 PM
tra added a comment.Jan 13 2023, 5:04 PM

Do we need to do it? I mean -- are there any observable differences between enforced alignment of 4 and not? For register-passed parameters it will make no difference, and for the parameters passed via constant/local memory, it may or may not make a difference.

I guess at the extreme it would impact how many 1-byte parameters we may have, as the total size of param address space has an upper bound (for kernels it's 4K: https://docs.nvidia.com/cuda/cuda-c-programming-guide/#parameter-buffer-layout). If that's the case, then it probably does not matter whether we can pass 4K arguments or only 1K.

In other words, I do not see a practical need to introduce an option which may not have any practical use.

pavelkopyl edited the summary of this revision. (Show Details)Jan 25 2023, 4:56 PM

Do we need to do it? I mean -- are there any observable differences between enforced alignment of 4 and not? For register-passed parameters it will make no difference, and for the parameters passed via constant/local memory, it may or may not make a difference.

I guess at the extreme it would impact how many 1-byte parameters we may have, as the total size of param address space has an upper bound (for kernels it's 4K: https://docs.nvidia.com/cuda/cuda-c-programming-guide/#parameter-buffer-layout). If that's the case, then it probably does not matter whether we can pass 4K arguments or only 1K.

In other words, I do not see a practical need to introduce an option which may not have any practical use.

Sorry for long response.
OK, I'll remove the option. I just wanted to clarify, are you agree in general with removing this workaround?

pavelkopyl added a comment.EditedFeb 2 2023, 3:53 AM

I guess, there is one more reason to getting back to default alignment.It seems we break ABI with alignment forced to 4. This may be an issue for mixed build clang + nvcc.

Hi @tra, does it make sense ABI related reason for you, or it's reasonable just to drop this change?

tra accepted this revision.Apr 24 2023, 11:22 AM

Hi @tra, does it make sense ABI related reason for you, or it's reasonable just to drop this change?

Yes. Removing this workaround makes sense.

This revision is now accepted and ready to land.Apr 24 2023, 11:22 AM