Without this, LLVM will assume the pointer has the type's alignment, which is undefined if the value is actually underaligned.
rdar://58316406
Paths
| Differential D74183
[IRGen] Add an alignment attribute to underaligned sret parameters ClosedPublic Authored by erik.pilkington on Feb 6 2020, 4:52 PM.
Details Summary Without this, LLVM will assume the pointer has the type's alignment, which is undefined if the value is actually underaligned. rdar://58316406
Diff Detail Event TimelineHerald added subscribers: ributzka, dexonsmith, jkorous. · View Herald TranscriptFeb 6 2020, 4:52 PM
Comment Actions I'm just going to assume the test changes look good; thank you for taking the time to do this. This revision is now accepted and ready to land.Mar 23 2020, 11:05 AM Comment Actions I think we should remove the LangRef rule that says "sret" pointers have to be dereferenceable/naturally aligned, and let the frontend add explicit aligned/dereferenceable markings where appropriate. (At that point, sret has no target-independent meaning; it's just to manipulate the target ABI.) It would make the IR easier to understand, and resolves the interaction with opaque pointers. That isn't to say we shouldn't make this change in clang; clang should do this, but at that point it would just be a performance enhancement, not required for correctness. Comment Actions
That might be reasonable, yeah. And yeah, making the dereferenceable assumption explicit is going to be necessary with opaque pointer types eventually. Closed by commit rGde98cf92e301: [CodeGen] Add an alignment attribute to all sret parameters (authored by erik.pilkington). · Explain WhyMar 24 2020, 12:55 PM This revision was automatically updated to reflect the committed changes. Comment Actions For the record, this change caused an 1.5% compile-time regression on tramp3d-v4 (http://llvm-compile-time-tracker.com/compare.php?from=43a6d285bfead762ac472a6e62beedc9f88bce89&to=de98cf92e301ab559a7417f1eca5cfa53624c9e1&stat=instructions). As there was also a 0.9% increase in code size, I assume that adding the alignment ends up having a non-trivial impact on optimization behavior for this benchmark and we end up generating more IR. So most likely this is fine and there's nothing to be done about it. Comment Actions If it's just tramp3d-v4, I'm not that concerned... but that's a weird result. On x86 in particular, alignment markings have almost no effect on optimization, generally. Comment Actions
I've just looked at the IR diff for tramp3d-v4 and it turns out that the root cause is an old friend of mine: The insertion of alignment assumptions during inlining (https://github.com/llvm/llvm-project/blob/b58902bc72c2b479b5ed27ec0d3422ba9782edbb/llvm/lib/Transforms/Utils/InlineFunction.cpp#L1139-L1173). That is, the IR now contains many instances of this sequence: %ptrint = ptrtoint %class.GuardLayers* %guards_m to i64 %maskedptr = and i64 %ptrint, 3 %maskcond = icmp eq i64 %maskedptr, 0 tail call void @llvm.assume(i1 %maskcond) to preserve the alignment information. From a cursory look I cannot tell whether these additional assumes also regress optimization (due to multi-use), but given the size increase on the final binary it seems pretty likely that this is the case. This preservation of alignment during inlining is the reason why we used to not emit alignment information for pointer arguments in Rust for a long time: It caused serious regressions in optimization and increased compile-time. Nowadays we do emit alignment information, but set -preserve-alignment-assumptions-during-inlining=false to prevent this inlining behavior. I think for the purposes of this revision, this means that we should probably either a) default preserve-alignment-assumptions-during-inlining to false (I would prefer this) or b) not emit the alignment unless it is smaller than the ABI alignment (I guess this was what this patch originally did?) Comment Actions Yeah, if emitting alignment assumptions in inlining is causing regressions when frontends provide better information, those assumptions need to be reverted until they can be fixed. Comment Actions That makes sense. I would slightly lean towards not generating the assumptions, given the current state of assumptions and the likely benefit in this context. Comment Actions
We are having a problem with this very issue on a target not supporting a stack, with sroa bailing due to above, in our case causing a crash. Our only workaround for this is currently preserve-alignment-assumptions-during-inlining to false. We were actually wondering if this is causing performance issues on targets that do support a stack.
Revision Contents
Diff 251806 clang/lib/CodeGen/CGCall.cpp
clang/test/CodeGen/2006-05-19-SingleEltReturn.c
clang/test/CodeGen/aarch64-varargs.c
clang/test/CodeGen/aggregate-assign-call.c
clang/test/CodeGen/aligned-sret.c
clang/test/CodeGen/arc/arguments.c
clang/test/CodeGen/arm-aapcs-vfp.c
clang/test/CodeGen/arm-homogenous.c
clang/test/CodeGen/arm-neon-vld.c
clang/test/CodeGen/arm-varargs.c
clang/test/CodeGen/arm-vector-arguments.c
clang/test/CodeGen/arm-vfp16-arguments.c
clang/test/CodeGen/arm-vfp16-arguments2.cpp
clang/test/CodeGen/arm64-arguments.c
clang/test/CodeGen/arm64-microsoft-arguments.cpp
clang/test/CodeGen/arm64_32.c
clang/test/CodeGen/arm_neon_intrinsics.c
clang/test/CodeGen/blocks.c
clang/test/CodeGen/c11atomics-ios.c
clang/test/CodeGen/c11atomics.c
clang/test/CodeGen/lanai-arguments.c
clang/test/CodeGen/le32-arguments.c
clang/test/CodeGen/mcu-struct-return.c
clang/test/CodeGen/mingw-long-double.c
clang/test/CodeGen/mips-zero-sized-struct.c
clang/test/CodeGen/mips64-padding-arg.c
clang/test/CodeGen/ms_abi.c
clang/test/CodeGen/ppc64-align-struct.c
clang/test/CodeGen/ppc64-elf-abi.c
clang/test/CodeGen/ppc64-qpx-vector.c
clang/test/CodeGen/ppc64-soft-float.c
clang/test/CodeGen/ppc64-vector.c
clang/test/CodeGen/ppc64le-aggregates.c
clang/test/CodeGen/ppc64le-f128Aggregates.c
clang/test/CodeGen/regparm-struct.c
clang/test/CodeGen/renderscript.c
clang/test/CodeGen/riscv32-ilp32-abi.c
clang/test/CodeGen/riscv32-ilp32-ilp32f-abi.c
clang/test/CodeGen/riscv32-ilp32-ilp32f-ilp32d-abi.c
clang/test/CodeGen/riscv32-ilp32d-abi.c
clang/test/CodeGen/riscv32-ilp32f-abi.c
clang/test/CodeGen/riscv32-ilp32f-ilp32d-abi.c
clang/test/CodeGen/riscv64-lp64-abi.c
clang/test/CodeGen/riscv64-lp64-lp64f-abi.c
clang/test/CodeGen/riscv64-lp64-lp64f-lp64d-abi.c
clang/test/CodeGen/riscv64-lp64d-abi.c
clang/test/CodeGen/sparcv9-abi.c
clang/test/CodeGen/struct-passing.c
clang/test/CodeGen/systemz-abi-vector.c
clang/test/CodeGen/systemz-abi.c
clang/test/CodeGen/systemz-abi.cpp
clang/test/CodeGen/systemz-inline-asm.c
clang/test/CodeGen/vectorcall.c
clang/test/CodeGen/wasm-arguments.c
clang/test/CodeGen/wasm-varargs.c
clang/test/CodeGen/windows-struct-abi.c
clang/test/CodeGen/x86_32-arguments-darwin.c
clang/test/CodeGen/x86_32-arguments-iamcu.c
clang/test/CodeGen/x86_64-arguments-nacl.c
clang/test/CodeGen/x86_64-arguments-win32.c
clang/test/CodeGen/x86_64-arguments.c
clang/test/CodeGenCXX/arm-cc.cpp
clang/test/CodeGenCXX/builtin-source-location.cpp
clang/test/CodeGenCXX/call-with-static-chain.cpp
clang/test/CodeGenCXX/conditional-gnu-ext.cpp
clang/test/CodeGenCXX/cxx1z-copy-omission.cpp
clang/test/CodeGenCXX/cxx1z-lambda-star-this.cpp
clang/test/CodeGenCXX/exceptions.cpp
clang/test/CodeGenCXX/homogeneous-aggregates.cpp
clang/test/CodeGenCXX/lambda-expressions.cpp
clang/test/CodeGenCXX/microsoft-abi-byval-sret.cpp
clang/test/CodeGenCXX/microsoft-abi-byval-thunks.cpp
clang/test/CodeGenCXX/microsoft-abi-cdecl-method-sret.cpp
clang/test/CodeGenCXX/microsoft-abi-eh-cleanups.cpp
clang/test/CodeGenCXX/microsoft-abi-sret-and-byval.cpp
clang/test/CodeGenCXX/microsoft-abi-vmemptr-conflicts.cpp
clang/test/CodeGenCXX/regcall.cpp
clang/test/CodeGenCXX/stack-reuse-miscompile.cpp
clang/test/CodeGenCXX/stack-reuse.cpp
clang/test/CodeGenCXX/temporaries.cpp
clang/test/CodeGenCXX/thiscall-struct-return.cpp
clang/test/CodeGenCXX/thunk-returning-memptr.cpp
clang/test/CodeGenCXX/thunks.cpp
clang/test/CodeGenCXX/trivial_abi.cpp
clang/test/CodeGenCXX/unknown-anytype.cpp
clang/test/CodeGenCXX/wasm-args-returns.cpp
clang/test/CodeGenCXX/x86_32-arguments.cpp
clang/test/CodeGenCXX/x86_64-arguments.cpp
clang/test/CodeGenCoroutines/coro-await.cpp
clang/test/CodeGenCoroutines/coro-gro-nrvo.cpp
clang/test/CodeGenObjC/arc.m
clang/test/CodeGenObjC/direct-method.m
clang/test/CodeGenObjC/nontrivial-c-struct-exception.m
clang/test/CodeGenObjC/objc-non-trivial-struct-nrvo.m
clang/test/CodeGenObjC/stret-1.m
clang/test/CodeGenObjC/weak-in-c-struct.m
clang/test/CodeGenObjCXX/objc-struct-cxx-abi.mm
clang/test/CodeGenOpenCL/addr-space-struct-arg.cl
clang/test/CodeGenOpenCL/amdgpu-abi-struct-coerce.cl
clang/test/CodeGenOpenCLCXX/addrspace-of-this.cl
clang/test/Modules/templates.mm
|
Why only when under-aligned? Just to avoid churning tests? I think we should apply this unconditionally.