In commit 2be0abb7fe72ed453 (D149893) the load store vectorized was
reimplemented. One thing that can happen with the new LSV is that
it can increase the align of alloca and global objects. However,
the code comments indicate that the intention only was to increase
alignment of alloca.
This patch is using stripInBoundsOffsets to analyse if the load/store
actually is accessing an alloca. And then we only try to change the
align if we find an alloca instruction. This way the actual code will
match better with code comments (and we won't see that alignment is
changed to 4 bytes, for global variables, just because the load
store vectorizer assumes that it is fine for align stack variables
at a four byte boundary).
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Can you think of a test that would directly demonstrate effect of the patch on the alignment?
I think the current test does it indirectly. We manage to vectorize loads/stores when we promote the alignment (though the test cases are inexplicably labeled UNALIGNED), and do it per-element otherwise.
At the very least, we should add a comment there explaining what the vectorization or lack of it indicates.
llvm/test/Transforms/LoadStoreVectorizer/AMDGPU/adjust-alloca-alignment.ll | ||
---|---|---|
291–299 | Nit: I'd suggest keeping the common CHECK-LABEL, instead of replicating it for aligned/unaligned variants. |
Add a test case that explicitly verify that the global variables alignment isn't updated.
llvm/test/Transforms/LoadStoreVectorizer/AMDGPU/adjust-alloca-alignment.ll | ||
---|---|---|
291–299 | I only used the update script to re-generated the checks, and this is what I got. If this test file shouldn't be based on automatically generated test checks, then I think you need to remove the first line saying ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py Otherwise people will end up updating this, e.g. when using the -u flag in utils/update_test_checks.py which afaik triggers on that line. |
I added an explicit check it in a separate test file. Maybe it can be moved into the adjust-alloca-alignment.ll file, if using something else instead of --match-full-lines to ensure that there isn't anything else after the "align 1" (such as "align 16").
llvm/test/Transforms/LoadStoreVectorizer/AMDGPU/dont-adjust-globalobj-alignment.ll | ||
---|---|---|
7 | I'll change this to [8 x i16] or similar. A bit ugly to use i32 here and then store using i16. |
Looks reasonable to me. Thank you, and sorry for the breakage.
llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp | ||
---|---|---|
805 | getOrEnforceKnownAlignment will merely stripPointerCasts, which is weaker than stripInBoundsOffsets. Presumably we should do the same here? |
llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp | ||
---|---|---|
805 | I agree that it makes sense to use the same kind of strip here. When using stripPointerCasts I got some more diffs in the @load_alloca16_unknown_offset_align1_i32 test case, so then I changed this to use the stronger stripInBoundsOffsets to make the impact of my fixup smaller. There is however already a FIXME for that specific test: ; FIXME: Although the offset is unknown here, we know it is a multiple ; of the element size, so should still be align 4 So I think that we will get back to a state when that FIXME makes sense (when the test show an opportunity to improve something since we do not vectorize the ALIGNED case any longer). |
Addressed review feedback (now using stripPointerCasts instead of stripInBoundsOffsets). Only the @load_alloca16_unknown_offset_align1_i32 test case that was impacted.
llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp | ||
---|---|---|
805 | Sorry, that FIXME was about @load_unknown_offset_align1_i32 (so not the same test case). Anyway, I've changed to stripPointerCasts, and then @load_alloca16_unknown_offset_align1_i32 was impacted as it no longer vectorize the ALIGNED case. I don't know if that test case is a bit contrived anyway (why would it use align 1 on those i32 loads in the first place). |
llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp | ||
---|---|---|
805 | Defer the strip pointer casts until after the cheaper getAllocAddrSpace checks? |
llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp | ||
---|---|---|
805 |
Right, I can fix that! Although I'm not sure if we really need to check the getAllocaAddrSpace now when we check that it is an alloca. But if doing that check first, then it at least is possible to avoid the strip pointer cast for targets with multiple address spaces. |
getOrEnforceKnownAlignment will merely stripPointerCasts, which is weaker than stripInBoundsOffsets. Presumably we should do the same here?