This is an archive of the discontinued LLVM Phabricator instance.

[amdgpu] WIP variadics
Needs ReviewPublic

Authored by JonChesterfield on Aug 17 2023, 9:21 PM.
This revision needs review, but there are no reviewers specified.

Details

Reviewers
None
Summary

Won't ship as is, testing in progress. Posting to collaborate with @jhuber6 on libc testing. snprintf passing at present. Commit message will change from this.

RFC is at https://discourse.llvm.org/t/rfc-desugar-variadics-codegen-for-new-targets-optimisation-for-existing/72854

I'm looking to implement this as an ABI lowering for amdgpu (possibly nvptx) so those backends don't have to consider variadic functions, and as an optimisation for x64/aarch64 etc.

Diff Detail

Event Timeline

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptAug 17 2023, 9:21 PM
JonChesterfield requested review of this revision.Aug 17 2023, 9:21 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptAug 17 2023, 9:21 PM

The lowering pass is broadly right, missing a few edge cases.

libc/config/gpu/entrypoints.txt
86

^these try to build, but fail. I haven't managed to debug the libc cmake to work out what I'm missing

libc/test/src/__support/CMakeLists.txt
62–70

this one builds and passes

llvm/lib/CodeGen/ExpandVAIntrinsics.cpp
159 ↗(On Diff #551373)

Cut this from the WIP patch as the draft is noisy. Ran some tests through x64 with a patched clang locally, expanding functions that pass that precondition and leaving others unchanged.

@arsenm suggested va_start/va_end should be addrspace aware, similar idea to https://reviews.llvm.org/D156666. Should be less invasive for the va intrinsics as they're not used so widely.

arsenm added inline comments.Aug 18 2023, 10:34 AM
llvm/lib/CodeGen/ExpandVAIntrinsics.cpp
38 ↗(On Diff #551373)

Don't need

44–47 ↗(On Diff #551373)

Don't understand the point of this

151 ↗(On Diff #551373)

isDefinitionExact?

If this is just how the ABI is going to lower, I don't see why you need to worry about any such restrictions

181 ↗(On Diff #551373)

Probably should defend against sret and the other ABI attributes

209 ↗(On Diff #551373)

How is StructType::create not using Twine

215–217 ↗(On Diff #551373)

What's wrong with just Builder.CreateAlloca?

217 ↗(On Diff #551373)

what's wrong with the abi type alignment? why does the stack alignment matter?

226 ↗(On Diff #551373)

alignTo shouldn't be difficult

229–230 ↗(On Diff #551373)

just CreateAddrSpaceCast should work?

253 ↗(On Diff #551373)

there are more CallBase types than call and invoke now. Can't you just mutate the call operand in place?

258 ↗(On Diff #551373)

I'd assume all metadata would be preservable if this is just how the ABI works

272–276 ↗(On Diff #551373)

no return after else,

llvm/test/CodeGen/Generic/expand-variadic-intrinsics.ll
3

-passes.

Also "generic" codegen tests are a fiction, use a real triple

49

Use named values

77

Needs some indirect variadic call tests

arsenm added inline comments.Aug 18 2023, 10:38 AM
llvm/test/CodeGen/Generic/expand-variadic-intrinsics.ll
77

Also some metadata and signext/zeroext preservation tests

77

Also a case where the user isn't the call operand

  • Rename ExpandVAIntrinsics to DesugarVariadics
JonChesterfield edited the summary of this revision. (Show Details)Aug 18 2023, 12:54 PM
JonChesterfield edited the summary of this revision. (Show Details)Aug 18 2023, 1:19 PM
arsenm added inline comments.Aug 18 2023, 1:21 PM
libc/config/gpu/entrypoints.txt
84–85

Split of the libc stuff into a separate patch, the lowering pass should be a standalone change

arsenm added inline comments.Aug 18 2023, 1:25 PM
llvm/lib/CodeGen/DesugarVariadics.cpp
22

Can you expand on the ABI requirements of this in the comment? If we make this be the way the ABI works for AMDGPU, we don't need any conditions. Should there be some control for the targets with established ABIs to apply this to internal functions?

38

Don't need cstdio

102–103

Can you use nontrivial types with this? Might be better to just start with memcpy

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
621

desugar is a weird name for a lowering pass

llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
188

This is rather late, I'd think we'd be better off with an earlier run (e.g. as part of the initial module passes, along with sanitizers)

llvm/test/CodeGen/AMDGPU/unsupported-calls.ll
57–58

Deleted the wrong error?

JonChesterfield added inline comments.
llvm/lib/CodeGen/ExpandVAIntrinsics.cpp
215–217 ↗(On Diff #551373)

I want to mark the callee parameter with alignment metadata so the pointer alignment can constant fold (todo, said folding isn't currently working very well).

The struct contains whatever the caller was passing which we don't know much about. If we call CreateAlloca, it gets whatever alignment suffices for those fields.

If those two don't line up nicely, you get bad times. I'm using the stack alignment from the target string as a heuristic for a reasonable alignment to use for things on the stack. For amdgpu that's 4, which means passing a double gets an dynamic realignment which doesn't optimise out, but otherwise it seems reasonable.

arsenm added inline comments.Aug 18 2023, 1:27 PM
llvm/lib/CodeGen/DesugarVariadics.cpp
208–209

Should we go for a packed struct forced to align 4? There's no upside to stack values with > 4 align

arsenm added inline comments.Aug 18 2023, 1:30 PM
llvm/lib/CodeGen/DesugarVariadics.cpp
297

Test the comdat? Weird that copyAttributesFrom seems to not cover it

339

Add a test with a def and decl in a global initializer, and with a constantexpr cast in the initializer. I think this should also handle blockaddress

Name candidates

  • expand
  • lower
  • desugar
  • transform

Lowering probably makes the most sense for the abi level apply to all functions, I like desugar to cover rewriting a subset of the graph

libc/config/gpu/entrypoints.txt
84–85

There's quite a lot of wip in this patch. Your feedback is welcome, but you could also wait for a cleaned up version with your name on the reviewer list if you prefer. The libc plumbing is keeping track of which parts are currently passing for collaboration with jhuber6

llvm/test/CodeGen/AMDGPU/unsupported-calls.ll
57–58

this is curious - the test is passing as written, maybe a quirk of running it with not llc

arsenm added inline comments.Aug 18 2023, 1:33 PM
llvm/lib/CodeGen/DesugarVariadics.cpp
74–77

Can you use ptrmask?

142

This would be better as a pass parameter. Don't see much point to specifying individual functions by name

arsenm added inline comments.Aug 18 2023, 1:37 PM
llvm/lib/CodeGen/DesugarVariadics.cpp
296

This might be missing copying the linkage

arsenm added inline comments.Aug 18 2023, 1:48 PM
llvm/lib/CodeGen/DesugarVariadics.cpp
145

I think you need to guard against calls to variadic intrinsics