This is an archive of the discontinued LLVM Phabricator instance.

[LoadStoreVectorizer] Only upgrade align for alloca
ClosedPublic

Authored by bjope on Jun 7 2023, 10:36 AM.

Details

Summary

In commit 2be0abb7fe72ed453 (D149893) the load store vectorized was
reimplemented. One thing that can happen with the new LSV is that
it can increase the align of alloca and global objects. However,
the code comments indicate that the intention only was to increase
alignment of alloca.
This patch is using stripInBoundsOffsets to analyse if the load/store
actually is accessing an alloca. And then we only try to change the
align if we find an alloca instruction. This way the actual code will
match better with code comments (and we won't see that alignment is
changed to 4 bytes, for global variables, just because the load
store vectorizer assumes that it is fine for align stack variables
at a four byte boundary).

Diff Detail

Event Timeline

bjope created this revision.Jun 7 2023, 10:36 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 7 2023, 10:36 AM
bjope requested review of this revision.Jun 7 2023, 10:36 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 7 2023, 10:36 AM
tra added a comment.Jun 7 2023, 11:59 AM

Can you think of a test that would directly demonstrate effect of the patch on the alignment?

I think the current test does it indirectly. We manage to vectorize loads/stores when we promote the alignment (though the test cases are inexplicably labeled UNALIGNED), and do it per-element otherwise.
At the very least, we should add a comment there explaining what the vectorization or lack of it indicates.

llvm/test/Transforms/LoadStoreVectorizer/AMDGPU/adjust-alloca-alignment.ll
291–299

Nit: I'd suggest keeping the common CHECK-LABEL, instead of replicating it for aligned/unaligned variants.

bjope updated this revision to Diff 529426.Jun 7 2023, 1:48 PM

Add a test case that explicitly verify that the global variables alignment isn't updated.

llvm/test/Transforms/LoadStoreVectorizer/AMDGPU/adjust-alloca-alignment.ll
291–299

I only used the update script to re-generated the checks, and this is what I got.

If this test file shouldn't be based on automatically generated test checks, then I think you need to remove the first line saying

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py

Otherwise people will end up updating this, e.g. when using the -u flag in utils/update_test_checks.py which afaik triggers on that line.

bjope added a comment.Jun 7 2023, 1:51 PM

Can you think of a test that would directly demonstrate effect of the patch on the alignment?

I added an explicit check it in a separate test file. Maybe it can be moved into the adjust-alloca-alignment.ll file, if using something else instead of --match-full-lines to ensure that there isn't anything else after the "align 1" (such as "align 16").

bjope added inline comments.Jun 7 2023, 1:54 PM
llvm/test/Transforms/LoadStoreVectorizer/AMDGPU/dont-adjust-globalobj-alignment.ll
7

I'll change this to [8 x i16] or similar. A bit ugly to use i32 here and then store using i16.

bjope updated this revision to Diff 529430.Jun 7 2023, 1:56 PM

Update the array definition in the new test case.

jlebar accepted this revision.Jun 7 2023, 7:48 PM

Looks reasonable to me. Thank you, and sorry for the breakage.

llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
805

getOrEnforceKnownAlignment will merely stripPointerCasts, which is weaker than stripInBoundsOffsets. Presumably we should do the same here?

This revision is now accepted and ready to land.Jun 7 2023, 7:48 PM
bjope added inline comments.Jun 8 2023, 1:22 AM
llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
805

I agree that it makes sense to use the same kind of strip here.

When using stripPointerCasts I got some more diffs in the @load_alloca16_unknown_offset_align1_i32 test case, so then I changed this to use the stronger stripInBoundsOffsets to make the impact of my fixup smaller.

There is however already a FIXME for that specific test:

; FIXME: Although the offset is unknown here, we know it is a multiple
; of the element size, so should still be align 4

So I think that we will get back to a state when that FIXME makes sense (when the test show an opportunity to improve something since we do not vectorize the ALIGNED case any longer).

bjope updated this revision to Diff 529527.Jun 8 2023, 1:31 AM

Addressed review feedback (now using stripPointerCasts instead of stripInBoundsOffsets). Only the @load_alloca16_unknown_offset_align1_i32 test case that was impacted.

bjope added inline comments.Jun 8 2023, 1:35 AM
llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
805

Sorry, that FIXME was about @load_unknown_offset_align1_i32 (so not the same test case).

Anyway, I've changed to stripPointerCasts, and then @load_alloca16_unknown_offset_align1_i32 was impacted as it no longer vectorize the ALIGNED case. I don't know if that test case is a bit contrived anyway (why would it use align 1 on those i32 loads in the first place).

uabelho added a subscriber: uabelho.Jun 8 2023, 1:58 AM
This revision was automatically updated to reflect the committed changes.
arsenm added inline comments.Jun 9 2023, 6:43 AM
llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
805

Defer the strip pointer casts until after the cheaper getAllocAddrSpace checks?

bjope added inline comments.Jun 9 2023, 7:23 AM
llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
805

Defer the strip pointer casts until after the cheaper getAllocAddrSpace checks?

Right, I can fix that!

Although I'm not sure if we really need to check the getAllocaAddrSpace now when we check that it is an alloca. But if doing that check first, then it at least is possible to avoid the strip pointer cast for targets with multiple address spaces.