This is an archive of the discontinued LLVM Phabricator instance.

[GlobalOpt] recompute alignments for loads and stores of updated globals
ClosedPublic

Authored by spatel on May 15 2021, 5:44 AM.

Details

Summary

GlobalOpt can slice structs/arrays and change GEPs in the process, but it was not updating alignments for load/store users. This eventually causes the crashing seen in:
https://llvm.org/PR49661
https://llvm.org/PR50253

On x86, this required SLP+codegen to create an aligned vector store on an invalid address. The bugs would be easier to demonstrate on a target with stricter alignment requirements.

This is my first time looking at this pass, so I'm not sure if this is a complete solution. The alignment updating code is adapted from InstCombine, so I assume that part is tested and good.

Diff Detail

Event Timeline

spatel created this revision.May 15 2021, 5:44 AM
spatel requested review of this revision.May 15 2021, 5:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 15 2021, 5:44 AM

I think the bug is the other way around. For each scalar we are going to split off of aggregate,
we should first determine the alignment for scalar by determining what largest legal alignment
could it had as part of the aggregate (based on the alignment of the outer type, and offset).
Then, we shouldn't need to update uses, because by then we didn't change the alignment,
and if any use overestimated it, then it is, and was, UB.

Otherwise, aren't we going to loose overalignment of the aggregate scalars?

I think the bug is the other way around. For each scalar we are going to split off of aggregate,
we should first determine the alignment for scalar by determining what largest legal alignment
could it had as part of the aggregate (based on the alignment of the outer type, and offset).
Then, we shouldn't need to update uses, because by then we didn't change the alignment,
and if any use overestimated it, then it is, and was, UB.

I think we are already correctly finding the alignment of the updated global itself (see inline code comment). But I'm not sure how that can propagate to potentially stale alignment specifiers on the users other than what I proposed here.

The problem is that if we have an alignment that's known better than the minimum (and that alignment is correct, not UB, for the original code), it may not hold for the new inner type (see inline test comment).

Let me know if I'm not seeing it correctly (not too familiar with this!).

llvm/lib/Transforms/IPO/GlobalOpt.cpp
555–557

Alignment of the new global is updated here.

llvm/test/Transforms/GlobalOpt/globalsra-align.ll
40–41

This is accessing element 8 (7 + 1) of the 16-byte aligned global with 4-byte elements, so "align 16" is correct and the best that it can be for the original code, but that's wrong after we strip away the outer array and only have [7 x i32*]. This is the test that most closely models what is happening in the bug reports.

I think the bug is the other way around. For each scalar we are going to split off of aggregate,
we should first determine the alignment for scalar by determining what largest legal alignment
could it had as part of the aggregate (based on the alignment of the outer type, and offset).
Then, we shouldn't need to update uses, because by then we didn't change the alignment,
and if any use overestimated it, then it is, and was, UB.

I think we are already correctly finding the alignment of the updated global itself (see inline code comment). But I'm not sure how that can propagate to potentially stale alignment specifiers on the users other than what I proposed here.

The problem is that if we have an alignment that's known better than the minimum (and that alignment is correct, not UB, for the original code), it may not hold for the new inner type (see inline test comment).

That's precisely my point. Isn't it a bug that we reduce the alignment? Shouldn't we instead overalign the split-off scalar to the maximal non-UB alignment requested by the uses?

Let me know if I'm not seeing it correctly (not too familiar with this!).

The problem is that if we have an alignment that's known better than the minimum (and that alignment is correct, not UB, for the original code), it may not hold for the new inner type (see inline test comment).

That's precisely my point. Isn't it a bug that we reduce the alignment? Shouldn't we instead overalign the split-off scalar to the maximal non-UB alignment requested by the uses?

I don't think it is possible to overalign the global / base pointer to make this work. For example, take the motivating case:

@a = global [2 x [7 x i32*]] align 16

Assume 32-bit alignment for each pointer element in that array. We have (7 + 1) * 4 = 32-bytes ahead of the 8th element, so the original access to element [1][1] is guaranteed to be "align 16" bytes.
Now we strip off the outer array specifier:

@a0 = global [7 x i32*] align ??

What over-alignment of the base pointer can we use to make the access of element [1] of this new array continue to have "align 16"?
Do we want to use padding to make this work? Not sure how to specify that.

The problem is that if we have an alignment that's known better than the minimum (and that alignment is correct, not UB, for the original code), it may not hold for the new inner type (see inline test comment).

That's precisely my point. Isn't it a bug that we reduce the alignment? Shouldn't we instead overalign the split-off scalar to the maximal non-UB alignment requested by the uses?

I don't think it is possible to overalign the global / base pointer to make this work. For example, take the motivating case:

@a = global [2 x [7 x i32*]] align 16

Assume 32-bit alignment for each pointer element in that array. We have (7 + 1) * 4 = 32-bytes ahead of the 8th element, so the original access to element [1][1] is guaranteed to be "align 16" bytes.
Now we strip off the outer array specifier:

@a0 = global [7 x i32*] align ??

What over-alignment of the base pointer can we use to make the access of element [1] of this new array continue to have "align 16"?
Do we want to use padding to make this work? Not sure how to specify that.

Thank you for spelling this out.
Indeed, if we want to align an offset element, then we indeed need padding.
I'm conflicted here.

lebedev.ri accepted this revision.May 19 2021, 7:06 AM

Thinking about it more, i guess not having an obvious miscompile would be a good starting point.
We can deal with padding, if need be, later.
Thanks.

This revision is now accepted and ready to land.May 19 2021, 7:06 AM

Thinking about it more, i guess not having an obvious miscompile would be a good starting point.
We can deal with padding, if need be, later.

Thanks! I have no sense of whether this is a common/impactful transform, but yes, the fuzzers found this hole in the logic using a simple C program, so we should get it fixed.

MaskRay accepted this revision.May 19 2021, 10:39 AM

Looks great!

llvm/test/Transforms/GlobalOpt/globalsra-align.ll
65

Consider adding a load test.

define i32* @reduce_align_2(i32* %y) {
; CHECK-LABEL: @reduce_align_2(
; CHECK-NEXT:    store i32* null, i32** getelementptr inbounds ([7 x i32*], [7 x i32*]* @a.1, i32 0, i64 2), align 8
; CHECK-NEXT:    ret i32* null
;
  %x = load i32*, i32** getelementptr inbounds ([2 x [7 x i32*]], [2 x [7 x i32*]]* @a, i64 0, i64 0, i64 0), align 1
  store i32* %y, i32** getelementptr inbounds ([2 x [7 x i32*]], [2 x [7 x i32*]]* @a, i64 0, i64 1, i64 2), align 4
  ret i32* %x
}

define i32* @reduce_align_3(i32* %y) {
; CHECK-LABEL: @reduce_align_3(
; CHECK-NEXT:    store i32* null, i32** getelementptr inbounds ([7 x i32*], [7 x i32*]* @a.1, i32 0, i64 3), align 4
; CHECK-NEXT:    ret i32* null
;
  %x = load i32*, i32** getelementptr inbounds ([2 x [7 x i32*]], [2 x [7 x i32*]]* @a, i64 0, i64 0, i64 0), align 1
  store i32* %y, i32** getelementptr inbounds ([2 x [7 x i32*]], [2 x [7 x i32*]]* @a, i64 0, i64 1, i64 3), align 8
  ret i32* %x
}

define i32* @reduce_align_4() {
  %x = load i32*, i32** getelementptr inbounds ([2 x [7 x i32*]], [2 x [7 x i32*]]* @a, i64 0, i64 0, i64 2), align 1
  ret i32* %x
}
spatel added inline comments.May 20 2021, 8:22 AM
llvm/test/Transforms/GlobalOpt/globalsra-align.ll
65

I'll add "externally_initialized" to the global declaration and adjust the tests. The existing tests have loads, but they are all getting folded to null.