As it is being noted in D99249, lack of alignment information on this has been preventing LICM from happening.
For some time now, lack of alignment attribute does *not* imply natural alignment, but an alignment of 1.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
clang/lib/CodeGen/CGCall.cpp | ||
---|---|---|
2317 | Shouldn't the alignment also be valid for non-zero address spaces? This reminds me that I need to upstream our local CHERI changes to take the branch for all address spaces with an all-zero null pointer representation. |
clang/lib/CodeGen/CGCall.cpp | ||
---|---|---|
2317 |
Probably. It should also be fine when null pointer is valid. |
I'll land this tomorrow if no one wants to object.
As discussed in D99791, this is an obvious, and legal, bugfix.
As a heads up, I'm seeing segfaults on internal code as a result of this change, as well as errors in Eigen's unalignedassert.cpp test (specifically, this line asserts: https://github.com/madlib/eigen/blob/master/test/unalignedassert.cpp#L151).
Would be good to have a small standalone reproducer.
Not really sure how we can end up with a misaligned this, but it sounds like UB.
Eigen's case is UB -- the test case is verifying that the constructor of the Matrix type throws an exception if called on an under-aligned instance, I think in an effort to make the UB noisier for their users. It already didn't work in GCC, and they had a "workaround", here, https://github.com/madlib/eigen/blob/fa14b05455c9d9737ae577f686188ef358df9020/Eigen/src/Core/DenseStorage.h#L63
It's unsurprising that this change in Clang causes the same code to fail in Clang, too.
Indeed, it's looking like all of the various segfaults are resulting from undefined behavior, just like the Eigen assert was (per @jyknight's comment). One of the segfaults is in the OpenJDK runtime -- albeit our internal copy, so it's possible it might not be in the external versions -- so that's fun. Luckily it shows up in the bootstrapping part of the build, rather than lying in wait to bite people after it's deployed.
In any case, thanks for the quick reply, and I'll figure out a small reproducer if we find something that isn't UB.
Nono, you misunderstand, i want the samples *with* UB.
I will then revert this, and add UBSan check to catch that UB first.
Oh! Okay, will do. Do you still want the samples even if UBSan catches them?
Meanwhile, would you be open to providing a gating flag to turn this off, so we can roll it out generally while turning it off for the specific things that have issues until we can fix them? So far, what we're finding is caught by UBSan but not very trivial to fix.
I'm interested in the cases that *aren't* currently being caught by UBSan, or are caught too late.
Meanwhile, would you be open to providing a gating flag to turn this off, so we can roll it out generally while turning it off for the specific things that have issues until we can fix them? So far, what we're finding is caught by UBSan but not very trivial to fix.
It seems like there's a bug with vtable thunks getting the wrong information. This appears to be a pre-existing bug, but this change has caused it to start actively breaking code.
Test case:
class Base1 { virtual void Foo1(); }; class Base2 { virtual void Foo2(); }; class alignas(16) Obj : public Base1, public Base2 { void Foo1() override; void Foo2() override; ~Obj(); }; void Obj::Foo1() {} void Obj::Foo2() {}
emits three method definitions:
define dso_local void @_ZN3Obj4Foo1Ev(%class.Obj* nonnull align 16 dereferenceable(16) %0) unnamed_addr #0 align 2 !dbg !7 { define dso_local void @_ZN3Obj4Foo2Ev(%class.Obj* nonnull align 16 dereferenceable(16) %0) unnamed_addr #0 align 2 !dbg !25 { define dso_local void @_ZThn8_N3Obj4Foo2Ev(%class.Obj* nonnull align 16 dereferenceable(16) %0) unnamed_addr #2 align 2 !dbg !29 {
(See https://godbolt.org/z/MxhYMe1q7, for now at least)
That third method declaration is bogus -- its argument is _not_ an Obj* at all! In fact, it's pointing at Obj + 8 -- at the embedded Base2 object! As such, align 16 is incorrect, as is dereferenceable(16). The additional 8 bytes of dereferenceable claim apparently hasn't broken anything noticeable, but the alignment claim is causing actual trouble.
As such, I suggest to revert this change, separately commit a fix to that underlying bug, and then re-submit this change after that.
The OpenJDK bug was UB in the OpenJDK code: https://bugs.openjdk.java.net/browse/JDK-8229258 already fixed in JDK14. (But not backported to JDK 11 LTS, which is the version Brooks found an error in above.)
They probably need to backport that patch.
My only confusion there is that I had thought GCC had the same optimizations around "this" alignment, so it's a bit strange to me that this hadn't been causing miscompiles in OpenJDK compiled by GCC, too. (Although, I note the OpenJDK build is passing -fno-delete-null-pointer-checks)
Temporarily reverted in 6270b3a1eafaba4279e021418c5a2c5a35abc002.
Eagerly awaiting fix for that bug.
ping @rsmith @rjmccall @aaron.ballman
To spell this out explicitly: i'm afraid i'm not really sure how to fix this preexisting thunk irgen bug.
I think it should be fixed in CodeGenVTables::maybeEmitThunk(), likely in arrangeGlobalDeclaration().
Or is this wrong on the AST level, too?
+1 on eagerly awaiting a fix. a 3% regression on astar (AArch64, LTO) bisects to @lebedev.ri 's revert: https://reviews.llvm.org/rG6270b3a1eafaba4279e021418c5a2c5a35abc002 .
The AST-level ThunkInfo doesn't track the this parameter type at all, so I think this is probably a CodeGen-specific concern. I think this is a bug in maybeEmitThunk:
const CGFunctionInfo &FnInfo = IsUnprototyped ? CGM.getTypes().arrangeUnprototypedMustTailThunk(MD) : CGM.getTypes().arrangeGlobalDeclaration(GD);
The problem here is that GD refers to the derived-class declaration (which takes a Derived*) but what we actually want is the base class declaration corresponding to the vtable entry for which we're trying to emit a thunk. So I think we want to find the class that introduced the vtable slot, look up the corresponding function in that class (CXXRecordDecl::getCorrespondingMethodInClass(ForVTable, true)), and arrange a declaration of *that* function.
I don't think there's an easy way to find the vtable slot owner from addVTableComponent, so perhaps the easiest thing to do would be to track either the effective this class or that class's corresponding method declaration in ThunkInfo.
Note that this is also wrong for covariant return types. For example:
struct A {}; struct alignas(32) B : virtual A { char c[32]; }; struct Pad { char c[7]; }; struct C : B, Pad, virtual A {}; struct X { virtual A &f(); }; struct U { virtual ~U(); }; C c; struct Y : U, X { virtual B &f() override { return c; } }; Y y;
... generates a thunk that claims its return value is align 32 dereferenceable(40) %struct.B*, which is nonsense; the return value should be align 1 dereferenceable(1) %struct.A* because (as happens in this example) the A virtual base subobject could be at a different offset and and alignment from the B object. Using X::f instead of Y::f as the method whose declaration we arrange should fix that too.
I agree. The arrangement logic is assuming that the exact pointer type doesn't matter because it's all just a pointer in the end, but of course that breaks down when we start inferring attributes.
Pointer authentication also really wants to know the original declaration which gave rise to a v-table slot, and it's not the only thing. Maybe we should just change the v-table data structure to store that.
I have tried to fix this in D100388, but i don't quite understand all the moving parts,
so it's still not in a working state (even though it fixed those two tests),
so i'm not sure how productive it is to develop it via a review feedback :/
Oh, i forgot that this review was here.
I've just relanded this in rG16d03818412415c56efcd482d18c0cbdf712524c,
after workarounding the thunk issue in rGa624cec56d4bf61c1f3cb7daf2d27dac59c56fa4.
It would be good to fix it properly via D100388, but well,
a month with little progress..
Shouldn't the alignment also be valid for non-zero address spaces?
This reminds me that I need to upstream our local CHERI changes to take the branch for all address spaces with an all-zero null pointer representation.