This is an archive of the discontinued LLVM Phabricator instance.

[X86] Don't elide argument copies for scalarized vectors (PR63475)
ClosedPublic

Authored by nikic on Jun 29 2023, 6:46 AM.

Details

Summary

When eliding argument copies, the memory layout between a plain store of the type and the layout of the argument lowering on the stack must match. For multi-part argument lowerings, this is not necessarily the case.

The code already tried to prevent this optimization for "scalarized and extended" vectors, but the check for "extends" was incomplete. While a scalarized vector of i32s stores i32 values on the stack, these are stored in 8 byte stack slots (on x86_64), so effectively have padding.

Rather than trying to add more special cases to handle this (which is not straightforward), I'm going in the other direction and exclude scalarized vectors from this optimization entirely. This seems like a rare case that is not worth the hassle -- the complete lack of test coverage is not reassuring either.

Fixes https://github.com/llvm/llvm-project/issues/63475.

Diff Detail

Event Timeline

nikic created this revision.Jun 29 2023, 6:46 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 29 2023, 6:46 AM
nikic requested review of this revision.Jun 29 2023, 6:46 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 29 2023, 6:46 AM
nikic retitled this revision from [X86] Don't elide argument copies for scalarized vectors to [X86] Don't elide argument copies for scalarized vectors (PR63475).Jul 3 2023, 1:59 AM
nikic edited the summary of this revision. (Show Details)

Why do we change the distance from 4 to 8 in callee rather than change 8 to 4 for caller?
Although we don't have a standard calling converion for illegal types in IR, and what makes it more corner is its power-of-2 vector <8 x i32> is illegal too (https://godbolt.org/z/ds9WGPTeP), I think we still should make it close to practical converion, i.e., https://godbolt.org/z/YoMbzxzr8

nikic added a comment.Jul 6 2023, 7:23 AM

Why do we change the distance from 4 to 8 in callee rather than change 8 to 4 for caller?
Although we don't have a standard calling converion for illegal types in IR, and what makes it more corner is its power-of-2 vector <8 x i32> is illegal too (https://godbolt.org/z/ds9WGPTeP), I think we still should make it close to practical converion, i.e., https://godbolt.org/z/YoMbzxzr8

My understanding of how this works is that we legalize the vector into (legally typed) parts, and then those parts get assigned to registers until we run out of argument registers. At that point the remainder is passed on the stack. How that looks like is determined by the calling convention -- on x86_64 it means that the 32-bit registers get pushed into 64-bit stack slots.

I don't think we can make the 32-bit registers spill into 32-bit stack slots for the specific case that they come from an illegal vector -- and it wouldn't make a lot of sense, if you consider that only parts of the vector might get passed via stack, and the rest via registers, so you wouldn't get a complete vector on the stack anyway.

The other thing we can do is to chose a better argument passing legalization for the vector, and e.g. pass it via combination of widening and splitting as two <4 x i32> vectors instead. This would of course be the more efficient calling convention.

The problem here is that SDAG argument lowering currently doesn't support this. It supports widening a vector and it supports splitting a vector, but it doesn't support doing both. I took a look, and supporting this doesn't look particularly easy.

In any case, I still think that the fix here makes sense, in that if a vector gets scalarized, we should disable the argument copy elision optimization. The question whether the vector gets scalarized or not is ultimately a separate one...

pengfei accepted this revision.Jul 13 2023, 5:14 AM

LGTM. I don't think the calling convention of illegal type matters much and we can always change it in the future.

llvm/test/CodeGen/X86/pr63475.ll
99–100

Is it out of bound?

This revision is now accepted and ready to land.Jul 13 2023, 5:14 AM