This was structured in a way that implied every split argument is in
memory, or in registers. It is possible for a pass a original argument
partially in registers, and partially in memory. Transpose the logic
here to only consider a single piece at a time. Every individual
CCValAssign should be treated independently, and any merge to original
value needs to be handled later.
This is in preparation for merging some preprocessing hacks in the
AMDGPU calling convention lowering into the generic code. This was
intended to be NFC, but it does partially address a FIXME in the
memloc handling.
As a result, this does slightly change AArch64 handling of some
promoted arguments passed on the stack. The store will be emitted as
the smaller, piece type rather than a wider store of an anyext
value. I think this exposes a failure to merge stores later, as the
change in swifterror replaces a single 64-bit stp with 2 4-byte str.
I'm also not sure what the correct behavior for memlocs where the
promoted size is larger than the original value. I've opted to clamp
the memory access size to not exceed the value register to avoid the
explicit trunc/extend/vector widen/vector extract instruction. This
happens for AMDGPU for i8 arguments that end up stack passed, which
are promoted to i16 (I think this is a preexisting DAG bug though, and
they should not really be promoted when in memory).