This is an archive of the discontinued LLVM Phabricator instance.

GlobalISel: Restructure argument lowering loop in handleAssignments
ClosedPublic

Authored by arsenm on Jul 8 2020, 12:30 PM.

Details

Summary

This was structured in a way that implied every split argument is in
memory, or in registers. It is possible for a pass a original argument
partially in registers, and partially in memory. Transpose the logic
here to only consider a single piece at a time. Every individual
CCValAssign should be treated independently, and any merge to original
value needs to be handled later.

This is in preparation for merging some preprocessing hacks in the
AMDGPU calling convention lowering into the generic code. This was
intended to be NFC, but it does partially address a FIXME in the
memloc handling.

As a result, this does slightly change AArch64 handling of some
promoted arguments passed on the stack. The store will be emitted as
the smaller, piece type rather than a wider store of an anyext
value. I think this exposes a failure to merge stores later, as the
change in swifterror replaces a single 64-bit stp with 2 4-byte str.

I'm also not sure what the correct behavior for memlocs where the
promoted size is larger than the original value. I've opted to clamp
the memory access size to not exceed the value register to avoid the
explicit trunc/extend/vector widen/vector extract instruction. This
happens for AMDGPU for i8 arguments that end up stack passed, which
are promoted to i16 (I think this is a preexisting DAG bug though, and
they should not really be promoted when in memory).

Diff Detail

Event Timeline

arsenm created this revision.Jul 8 2020, 12:30 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 8 2020, 12:30 PM

I don't think we should be changing the extending behavior for memlocs.

Just for the varargs case in Darwin, there's lots of code out there which incorrectly try to interpret a sub 64bit incoming varargs parameter as a 64 bit value. Although it should be technically correct to emit a smaller store, what happens in practice is that this code breaks for very hard to detect reasons (i.e. you no longer get a free zeroing of the upper bits of the stack slot). In arm64 we could force this to always explicitly zero-extend to 64 bits but that incurs a penalty at the call site.

It's unfortunate but copying the DAG behavior here is likely to cause less pain.

arsenm updated this revision to Diff 277264.Jul 11 2020, 2:56 PM

Use LocVT, which this code should have been using in the first place which avoids shrinking the store.

I also found the 2 forms of assignValueToAddress confusing, and the full form interface can't handle the case where a single stack slot covers multiple registers (although it seems unlikely this would ever be needed)

aemerson added inline comments.Jul 14 2020, 10:55 AM
llvm/test/CodeGen/AArch64/GlobalISel/call-lowering-i128-on-stack.ll
11 ↗(On Diff #277264)

This test is actually falling back to SDAG. If you add -global-isel-abort=1 it crashes.

arsenm updated this revision to Diff 277936.Jul 14 2020, 12:27 PM
arsenm marked an inline comment as done.

Remove no longer fixed aarch64 case

llvm/test/CodeGen/AArch64/GlobalISel/call-lowering-i128-on-stack.ll
11 ↗(On Diff #277264)

I think enabling the fallback is a bad default for llc

aemerson added inline comments.Jul 14 2020, 12:45 PM
llvm/test/CodeGen/AArch64/GlobalISel/call-lowering-i128-on-stack.ll
11 ↗(On Diff #277264)

Probably true.

aemerson accepted this revision.Jul 21 2020, 8:49 PM

Sorry, didn’t realise you’d updated the patch.

This revision is now accepted and ready to land.Jul 21 2020, 8:49 PM