This is an archive of the discontinued LLVM Phabricator instance.

clang/X86: Don't emit "min-legal-vector-width"="0"
Needs RevisionPublic

Authored by arsenm on Dec 8 2022, 5:49 AM.

Details

Summary

This should be NFC as far as clang end user experience is concerned,
but does change X86's interpretation of IR without the attribute
explicitly specified.

This was clutter that's always annoyed me. I have no idea what this
does and just don't want to see it anymore. This was previously
attempted in D97116, which was reverted. It didn't attempt to deal
with the X86 backend's backwards treatment of the unset attribute
case. I tried out the reported regressing sample, and end to end I get
the same result before and after.

I have no idea what this attribute means and it seems to be very x86
specific spaghetti spread all around. It's constantly getting
recomputed. The IR documentation was only recently added, and makes no
sense to me. I have no idea what "legal" means in this context. At
first glance this corresponds to clang's min_vector_width attribute,
with a slightly different name. There seems to be an aggressive amount
of reinterpetation of this value.

The adventure begins in the calling convention
lowering. X86_64ABIInfo::classifyRegCallStructType, which computes a
"MaxVectorWidth", updating it based on the C types on certain
arguments. That is then used to initialize
CGFunctionInfo::setMaxVectorWidth.

CodeGenFunction::StartFunction then initializes LargestVectorWidth
based on MinVectorWidthAttr. CodeGenFunction::EmitCall computes a max
vector width again based on IR types for call sites? Yet another
reduction over the IR argument and return type is (redundantly?)
performed before final emission of the attribute. This is then clamped
to the value apparently computed during the initial calling convention
lowering?

In the middle end there's some additional insanity with attribute
propagation. There are additional argument type reductions. Finally,
the actual use modifies the X86Subtarget construction. It default
assumed UINT_MAX if the attribute isn't set, contrary to 0 this value
apparently used everywhere else. The point the subtarget is
constructed isn't well defined, so the underlying function's
min-legal-vector-width may have changed later during these attribute
propagation passes. Most of this code also consistently doesn't handle
vectors of pointers correctly.

The documentation and commit message for D123284 make no sense to
me. It's implying it's an ABI attribute, but the fact that it changes
in so many places and that clang doesn't emit it on declarations
implies it can't be one.

This avoids it the most common case, but it's still emitted for any
function with a vector in the arguments or return type. It should
perhaps be moved into x86 specific attribute emission. It would be
reasonable to treat this as a generic attribute if it was a 1:1
mapping from the source level attribute, which merely passes through
to the backend. Is there a reason we can't just have this pass through
to let x86 do the IR argument type inspection later? To minimize test
churn, I added yet another reduction to handle unannotated functions
in the subtarget construction. I believe this should be removed and
the remaining tests updated.

The net effect of all this complexity seems to be pretty small, only
changing handling of very large vectors on subtargets with avx512. I
updated the tests by merely placing min-legal-vector-width=512
(although some of the test failures without this looked reasonable to
just update, although they looked like regressions).

Diff Detail

Event Timeline

arsenm created this revision.Dec 8 2022, 5:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 8 2022, 5:49 AM
arsenm requested review of this revision.Dec 8 2022, 5:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 8 2022, 5:49 AM
Herald added a subscriber: wdng. · View Herald Transcript
pengfei requested changes to this revision.EditedDec 8 2022, 7:20 AM

The use of min-legal-vector-width doesn't look great to me either. I'm more than glad if we can remove it totally without any user perceivable affects.
I cannot agree with this change because it neither eliminates the clutter (but makes it even worse [1]) nor is NFC to end user.
I think we used UINT32_MAX just to be compatible with BCs that generated before introduing the attribute. This change definitely breaks the compatibility.
Placing a "min-legal-vector-width" = "512" doesn't make any sense either. For one thing, we cannot place the attribute in pre-built BC files, for another 512 is the current max vector suppoted on X86, we cannot guarantee no 1024, 2048 etc. in future and we cannot change it too once compiled into BC files.

[1] "min-legal-vector-width" = "0" was clear to indicate there are only scalar operations. But it is not clear especially we still set other values like 128, 256 etc.

This revision now requires changes to proceed.Dec 8 2022, 7:20 AM

The attribute is supposed to tell the backend if there were any vectors used in the C source code and how big they were. A value of 0 means the code didn't contain any vectors.

The backend assumes lack of the attributes means the IR didn't come from clang and can't be trusted to have been audited. That's why the backend uses UINT32_MAX.

If the attribute is present and less than 512, avx512 is enabled, and we're targeting certain CPUs with a avx512 frequency penalty, the backend type legalizer in X86 will not have 512 bit vectors as legal types. This will cause any vectors larger than 512 to be split. Such vectors were likely emitted by the loop or SLP vectorizer which isn't bound to the legal vector width.

If the C source used a 512 bit vector explicitly, either via target specific intrinsic, function argument or return, or inline assembly, etc. We need to have the backend treat the type as legal. If we don't do that, splitting the type could be incorrect. It could cause an ABI break across translation units. Or cause the backend legalizer to need to split something it isn't capable of splitting like a target specific intrinsic.

I admit the naming is unfortunate. I think I called the attribute min-legal-vector-width because it is the "minimum vector width the backend needs to consider legal", still subject to what subtarget features actually provide.

The clang code uses Max because it is calculated by maxing a bunch of things.

Isn't this (inherently) X86 specific?

Isn't this (inherently) X86 specific?

Yes it is. We could qualify the attribute emission with the targeting being X86?

The use of min-legal-vector-width doesn't look great to me either. I'm more than glad if we can remove it totally without any user perceivable affects.
I cannot agree with this change because it neither eliminates the clutter (but makes it even worse [1]) nor is NFC to end user.
I think we used UINT32_MAX just to be compatible with BCs that generated before introduing the attribute. This change definitely breaks the compatibility.

What I'm getting is this is only a performance hint, and definitively doesn't matter for ABI purposes. Bitcode backwards performance compatibility is not important. That would also be recovered by having a proper attribute propagation done as an optimization.

I think all of clang's handling of this should be purged, except for the part where it's passing through the user attribute.

Placing a "min-legal-vector-width" = "512" doesn't make any sense either. For one thing, we cannot place the attribute in pre-built BC files, for another 512 is the current max vector suppoted on X86, we cannot guarantee no 1024, 2048 etc. in future and we cannot change it too once compiled into BC files.

It's a test for specific behavior, with a specific configuration that exists today. It doesn't matter what this would be in the future for larger testcases

[1] "min-legal-vector-width" = "0" was clear to indicate there are only scalar operations.

It's not remotely clear what this means

[1] "min-legal-vector-width" = "0" was clear to indicate there are only scalar operations.

It's not remotely clear what this means

It also doesn't mean that, because the IR doesn't have to be consistent with the attribute. The IR exists independent of the attribute, and the attribute can only provide performance hints.

Isn't this (inherently) X86 specific?

Yes it is. We could qualify the attribute emission with the targeting being X86?

That would hide the annoyance from me, but I still think the implementation of this concept is poor and should be treated like other optimization hint attributes

[1] "min-legal-vector-width" = "0" was clear to indicate there are only scalar operations.

It's not remotely clear what this means

It also doesn't mean that, because the IR doesn't have to be consistent with the attribute. The IR exists independent of the attribute, and the attribute can only provide performance hints.

I don't agree. There are attributes like zeroext, byref are ABI related see https://llvm.org/docs/LangRef.html#parameter-attributes. I'd take min-legal-vector-width as a similar one.

pengfei added a comment.EditedDec 9 2022, 12:59 AM

Isn't this (inherently) X86 specific?

Yes it is. We could qualify the attribute emission with the targeting being X86?

That's a good idea. How about AArch64? I found there are uses there in the test cases.
Put a patch D139701 for AMDGPU only.

It also doesn't mean that, because the IR doesn't have to be consistent with the attribute. The IR exists independent of the attribute, and the attribute can only provide performance hints.

I don't agree. There are attributes like zeroext, byref are ABI related see https://llvm.org/docs/LangRef.html#parameter-attributes. I'd take min-legal-vector-width as a similar one.

This is not an ABI attribute, it's an optimization hint. If it were an ABI attribute, inferring and propagating it like is done would not be correct. If you treat it like an optimization hint, you only have to consider this one place instead of everywhere the IR may change

It also doesn't mean that, because the IR doesn't have to be consistent with the attribute. The IR exists independent of the attribute, and the attribute can only provide performance hints.

I don't agree. There are attributes like zeroext, byref are ABI related see https://llvm.org/docs/LangRef.html#parameter-attributes. I'd take min-legal-vector-width as a similar one.

This is not an ABI attribute, it's an optimization hint. If it were an ABI attribute, inferring and propagating it like is done would not be correct. If you treat it like an optimization hint, you only have to consider this one place instead of everywhere the IR may change

Those are also parameter attributes; this is a function attribute.

It also doesn't mean that, because the IR doesn't have to be consistent with the attribute. The IR exists independent of the attribute, and the attribute can only provide performance hints.

I don't agree. There are attributes like zeroext, byref are ABI related see https://llvm.org/docs/LangRef.html#parameter-attributes. I'd take min-legal-vector-width as a similar one.

This is not an ABI attribute, it's an optimization hint. If it were an ABI attribute, inferring and propagating it like is done would not be correct. If you treat it like an optimization hint, you only have to consider this one place instead of everywhere the IR may change

If the caller or callee calculate have different values and one of them is less than the width of the vector argument it will cause an ABI break. The type legalizer in SelectionDAG will split one and not the other. Maybe the backend should check the IR for the arguments/returns and increase the min-legal-vector-width if its less than argument width and the argument width is supported by the AVX level.

If the caller or callee calculate have different values and one of them is less than the width of the vector argument it will cause an ABI break. The type legalizer in SelectionDAG will split one and not the other. Maybe the backend should check the IR for the arguments/returns and increase the min-legal-vector-width if its less than argument width and the argument width is supported by the AVX level.

Right, I think the backend needs to fixup whatever it needs for hard requirements. The current implementation treating it like hard ABI any time a function signature changes isn't scalable. Every possible transform that could introduce a call site would need to handle this.