This is an archive of the discontinued LLVM Phabricator instance.

[IRGen] Add an alignment attribute to underaligned sret parameters
ClosedPublic

Authored by erik.pilkington on Feb 6 2020, 4:52 PM.

Details

Summary

Without this, LLVM will assume the pointer has the type's alignment, which is undefined if the value is actually underaligned.

rdar://58316406

Diff Detail

Event Timeline

erik.pilkington created this revision.Feb 6 2020, 4:52 PM
rjmccall added inline comments.Feb 6 2020, 6:44 PM
clang/lib/CodeGen/CGCall.cpp
2077

Why only when under-aligned? Just to avoid churning tests? I think we should apply this unconditionally.

scanon added inline comments.Feb 7 2020, 8:18 AM
clang/lib/CodeGen/CGCall.cpp
2077

On mainstream architectures today, there's rarely a benefit to knowing about higher alignment (e.g. MOVUPS is just as fast as MOVAPS if the address is actually aligned), so we won't see significant perf wins from preserving over-alignment in most cases, but it also doesn't cost us anything AFAICT and could deliver wins in some specific cases (e.g. AVX on SNB and IVB, where I think we split underaligned 256b stores into two 128b chunks).

So, yeah, I think we ought to simply unconditionally add the alignment to the sret.

dexonsmith added inline comments.Feb 7 2020, 10:30 AM
clang/lib/CodeGen/CGCall.cpp
2077

@rjmccall, are you seeing a reason to add the attribute when the implicit one is correct (neither over-aligned nor under-aligned)? If so, it seems to me like the added noise would make the IR harder to read.

rjmccall added inline comments.Feb 7 2020, 10:50 AM
clang/lib/CodeGen/CGCall.cpp
2077

Well, first, I think we're going to end up needing an alignment there in all cases eventually because of opaque pointer types. But I also think it's just cleaner and more testable to add the attribute in all cases instead of leaving it off when the IR type happens to have the right alignment, which can be very sensitive to the target.

rjmccall added inline comments.Feb 7 2020, 10:57 AM
clang/lib/CodeGen/CGCall.cpp
2077

In general, I think frontends should *never* be leaving it up to LLVM to infer alignment based on IR types, and this is part-and-parcel with that.

dexonsmith added inline comments.Feb 7 2020, 10:58 AM
clang/lib/CodeGen/CGCall.cpp
2077

I think we're going to end up needing an alignment there in all cases eventually because of opaque pointer types.

That's a great point. In that case I don't have a strong opinion.

Add alignment attribute to all sret parameters.

rjmccall accepted this revision.Mar 23 2020, 11:05 AM

I'm just going to assume the test changes look good; thank you for taking the time to do this.

This revision is now accepted and ready to land.Mar 23 2020, 11:05 AM

I think we should remove the LangRef rule that says "sret" pointers have to be dereferenceable/naturally aligned, and let the frontend add explicit aligned/dereferenceable markings where appropriate. (At that point, sret has no target-independent meaning; it's just to manipulate the target ABI.) It would make the IR easier to understand, and resolves the interaction with opaque pointers.

That isn't to say we shouldn't make this change in clang; clang should do this, but at that point it would just be a performance enhancement, not required for correctness.

I think we should remove the LangRef rule that says "sret" pointers have to be dereferenceable/naturally aligned, and let the frontend add explicit aligned/dereferenceable markings where appropriate. (At that point, sret has no target-independent meaning; it's just to manipulate the target ABI.) It would make the IR easier to understand, and resolves the interaction with opaque pointers.

That might be reasonable, yeah. And yeah, making the dereferenceable assumption explicit is going to be necessary with opaque pointer types eventually.

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptMar 24 2020, 12:55 PM
Herald added a subscriber: jrtc27. · View Herald Transcript
nikic added a subscriber: nikic.Mar 24 2020, 1:55 PM

For the record, this change caused an 1.5% compile-time regression on tramp3d-v4 (http://llvm-compile-time-tracker.com/compare.php?from=43a6d285bfead762ac472a6e62beedc9f88bce89&to=de98cf92e301ab559a7417f1eca5cfa53624c9e1&stat=instructions). As there was also a 0.9% increase in code size, I assume that adding the alignment ends up having a non-trivial impact on optimization behavior for this benchmark and we end up generating more IR. So most likely this is fine and there's nothing to be done about it.

If it's just tramp3d-v4, I'm not that concerned... but that's a weird result. On x86 in particular, alignment markings have almost no effect on optimization, generally.

nikic added a comment.Mar 25 2020, 2:38 PM

If it's just tramp3d-v4, I'm not that concerned... but that's a weird result. On x86 in particular, alignment markings have almost no effect on optimization, generally.

I've just looked at the IR diff for tramp3d-v4 and it turns out that the root cause is an old friend of mine: The insertion of alignment assumptions during inlining (https://github.com/llvm/llvm-project/blob/b58902bc72c2b479b5ed27ec0d3422ba9782edbb/llvm/lib/Transforms/Utils/InlineFunction.cpp#L1139-L1173). That is, the IR now contains many instances of this sequence:

%ptrint = ptrtoint %class.GuardLayers* %guards_m to i64
%maskedptr = and i64 %ptrint, 3
%maskcond = icmp eq i64 %maskedptr, 0
tail call void @llvm.assume(i1 %maskcond)

to preserve the alignment information. From a cursory look I cannot tell whether these additional assumes also regress optimization (due to multi-use), but given the size increase on the final binary it seems pretty likely that this is the case.

This preservation of alignment during inlining is the reason why we used to not emit alignment information for pointer arguments in Rust for a long time: It caused serious regressions in optimization and increased compile-time. Nowadays we do emit alignment information, but set -preserve-alignment-assumptions-during-inlining=false to prevent this inlining behavior.

I think for the purposes of this revision, this means that we should probably either a) default preserve-alignment-assumptions-during-inlining to false (I would prefer this) or b) not emit the alignment unless it is smaller than the ABI alignment (I guess this was what this patch originally did?)

Yeah, if emitting alignment assumptions in inlining is causing regressions when frontends provide better information, those assumptions need to be reverted until they can be fixed.

That makes sense.

I would slightly lean towards not generating the assumptions, given the current state of assumptions and the likely benefit in this context.

hgreving added a subscriber: hgreving.EditedApr 1 2020, 4:28 PM

If it's just tramp3d-v4, I'm not that concerned... but that's a weird result. On x86 in particular, alignment markings have almost no effect on optimization, generally.

I've just looked at the IR diff for tramp3d-v4 and it turns out that the root cause is an old friend of mine: The insertion of alignment assumptions during inlining (https://github.com/llvm/llvm-project/blob/b58902bc72c2b479b5ed27ec0d3422ba9782edbb/llvm/lib/Transforms/Utils/InlineFunction.cpp#L1139-L1173). That is, the IR now contains many instances of this sequence:

%ptrint = ptrtoint %class.GuardLayers* %guards_m to i64
%maskedptr = and i64 %ptrint, 3
%maskcond = icmp eq i64 %maskedptr, 0
tail call void @llvm.assume(i1 %maskcond)

to preserve the alignment information. From a cursory look I cannot tell whether these additional assumes also regress optimization (due to multi-use), but given the size increase on the final binary it seems pretty likely that this is the case.

This preservation of alignment during inlining is the reason why we used to not emit alignment information for pointer arguments in Rust for a long time: It caused serious regressions in optimization and increased compile-time. Nowadays we do emit alignment information, but set -preserve-alignment-assumptions-during-inlining=false to prevent this inlining behavior.

I think for the purposes of this revision, this means that we should probably either a) default preserve-alignment-assumptions-during-inlining to false (I would prefer this) or b) not emit the alignment unless it is smaller than the ABI alignment (I guess this was what this patch originally did?)

We are having a problem with this very issue on a target not supporting a stack, with sroa bailing due to above, in our case causing a crash. Our only workaround for this is currently preserve-alignment-assumptions-during-inlining to false. We were actually wondering if this is causing performance issues on targets that do support a stack.