This is an archive of the discontinued LLVM Phabricator instance.

SROA produces miscompiled code for bitfield access on big-endian targets
AbandonedPublic

Authored by labrinea on Jun 10 2015, 6:16 AM.

Download Raw Diff

Details

Reviewers

chandlerc
dexonsmith

Summary

test.c265 BDownload

The attached code is miscompiled when targeting big-endian at all optimisation levels except for -O0. This should print "checksum = 00000008", but actually prints "checksum = 00000000". It is correctly compiled if I change the statement just before the function call to func_13 from l_15.f0 to l_15.f1 (the result of this expression is unused). The only change this causes in the IR is to change the parameter in the call to func_13 from 0x800000180 to 0x800018000.

The problem seems to be in the 'scalar replacement of aggregates' pass. The problem arises because we have a 7-byte type but the alloca is 8 bytes (because it's 4-byte aligned), which causes the aggregate to be split up into two 4-byte slices except one actually ends up being 3 bytes. The pass takes into account endianness, thus adds a shift instruction when inserting an integer to an alloca store:

if(DL.isBigEndian())
  ShAmt = 8 * (DL.getTypeStoreSize(IntTy) - DL.getTypeStoreSize(Ty) - Offset);

In this particular example an integer value of wrong size ({i32}) is passed as parameter to the function that computes the shift amount (‘insertInteger’). This causes a zero shift amount since IntTy = {i64}, Ty = {i32}, and offset = 4 bytes. My patch passes a parameter of Ty = {i24} to ‘insertInteger’.

Diff Detail

Repository: rL LLVM

Event Timeline

labrinea updated this revision to Diff 27436.Jun 10 2015, 6:16 AM

labrinea retitled this revision from to SROA produces miscompiled code for bitfield access on big-endian targets.

labrinea updated this object.

labrinea edited the test plan for this revision. (Show Details)

labrinea added a reviewer: chandlerc.

labrinea set the repository for this revision to rL LLVM.

labrinea added a subscriber: Unknown Object (MLST).

labrinea added a reviewer: dexonsmith.Jun 26 2015, 1:52 AM

Chandler, this is your code, could you please look at this?

lib/Transforms/Scalar/SROA.cpp
2587	This line is longer than 80 characters, and needs to be reformatted so that you don't exceed that limit.

Sorry I've not gotten to this.

This doesn't seem like the right fix... this is something that I think I got fundamentally wrong in r177055, and I spot by inspection numerous related mini bugs that aren't being triggered.

For example, why doesn't this happen for stores as well? If they're not sharing the same broken logic, why not? I'll try to carve out time to specifically debug this problem, but I fear the fix is going to be a bit more involved than this.

Ok, please keep me informed.

Any update on this?

Chandler, could you please guide me a bit on how to resolve this alternatively?

I have investigated this thoroughly now. I believe I understand all of the interacting pieces.

Sadly, the situation is much, much worse than you might imagine. All of the robust fixes for this expose a terribly worse consequence. Currently, I've not found a robust fix that avoids dramatically regressing our ability to promote to SSA values. =/

There is a fundamental disconnect between SROA/mem2reg and how we are canonicalizing memory accesses. This is quite alarming and a source of great concern to me in addition to the big-endian miscompiles. I'm going to need to talk to several folks and look at abunch of other options to figure out what to actually do here.

chandlerc mentioned this in rL242869: [SROA] Fix a nasty pile of bugs to do with big-endian, different alloca.Jul 21 2015, 8:33 PM

After a really, really miserable amount of debugging and work, I've figured out how to fix the multitude of issues that this unearthed. I've landed a patch in r242869 which should address all of the issues I've managed to uncover here. I hope there isn't too much fallout from my fix...

Please let me know if you're still seeing any issues.

hans mentioned this in rL242924: Merging r242869:.Jul 22 2015, 11:50 AM

The problem is resolved you can now close the issue. Thanks!

labrinea abandoned this revision.Oct 9 2015, 3:42 AM

Revision Contents

Path

Size

lib/

Transforms/

Scalar/

SROA.cpp

5 lines

test/

Transforms/

SROA/

big-endian.ll

32 lines

Diff 27436

lib/Transforms/Scalar/SROA.cpp

Show First 20 Lines • Show All 2,577 Lines • ▼ Show 20 Lines	Value *rewriteIntegerLoad(LoadInst &LI) {
return V;		return V;
}		}

bool visitLoadInst(LoadInst &LI) {		bool visitLoadInst(LoadInst &LI) {
DEBUG(dbgs() << " original: " << LI << "\n");		DEBUG(dbgs() << " original: " << LI << "\n");
Value *OldOp = LI.getOperand(0);		Value *OldOp = LI.getOperand(0);
assert(OldOp == OldPtr);		assert(OldOp == OldPtr);

Type TargetTy = IsSplit ? Type::getIntNTy(LI.getContext(), SliceSize 8)		Type *TargetTy = IsSplit ?
		Type::getIntNTy(LI.getContext(),DL.getTypeStoreSizeInBits(NewAllocaTy))
		hfinkelUnsubmitted Not Done Reply Inline Actions This line is longer than 80 characters, and needs to be reformatted so that you don't exceed that limit. hfinkel: This line is longer than 80 characters, and needs to be reformatted so that you don't exceed…
: LI.getType();		: LI.getType();
bool IsPtrAdjusted = false;		bool IsPtrAdjusted = false;
Value *V;		Value *V;
if (VecTy) {		if (VecTy) {
V = rewriteVectorizedLoadInst();		V = rewriteVectorizedLoadInst();
} else if (IntTy && LI.getType()->isIntegerTy()) {		} else if (IntTy && LI.getType()->isIntegerTy()) {
V = rewriteIntegerLoad(LI);		V = rewriteIntegerLoad(LI);
} else if (NewBeginOffset == NewAllocaBeginOffset &&		} else if (NewBeginOffset == NewAllocaBeginOffset &&
canConvertValue(DL, NewAllocaTy, LI.getType())) {		canConvertValue(DL, NewAllocaTy, LI.getType())) {
▲ Show 20 Lines • Show All 1,893 Lines • Show Last 20 Lines

test/Transforms/SROA/big-endian.ll

Show First 20 Lines • Show All 106 Lines • ▼ Show 20 Lines	; CHECK-NOT: load
ret i64 %ret		ret i64 %ret
; CHECK-NEXT: %[[ext4:.*]] = zext i16 1 to i56		; CHECK-NEXT: %[[ext4:.*]] = zext i16 1 to i56
; CHECK-NEXT: %[[shift4:.*]] = shl i56 %[[ext4]], 40		; CHECK-NEXT: %[[shift4:.*]] = shl i56 %[[ext4]], 40
; CHECK-NEXT: %[[mask4:.*]] = and i56 %[[insert3]], 1099511627775		; CHECK-NEXT: %[[mask4:.*]] = and i56 %[[insert3]], 1099511627775
; CHECK-NEXT: %[[insert4:.*]] = or i56 %[[mask4]], %[[shift4]]		; CHECK-NEXT: %[[insert4:.*]] = or i56 %[[mask4]], %[[shift4]]
; CHECK-NEXT: %[[ret:.*]] = zext i56 %[[insert4]] to i64		; CHECK-NEXT: %[[ret:.*]] = zext i56 %[[insert4]] to i64
; CHECK-NEXT: ret i64 %[[ret]]		; CHECK-NEXT: ret i64 %[[ret]]
}		}

		%struct.S0 = type { i32, i24 }

		@main.l_15 = private unnamed_addr constant { i32, i8, i8, i8 } { i32 8, i8 0, i8 1, i8 -128 }, align 4

		declare void @llvm.memcpy.p0i8.p0i8.i64(i8, i8, i64, i32, i1)

		define i64 @test3() {
		; CHECK-LABEL: @test3(
		entry:
		%l_15 = alloca %struct.S0, align 4
		%r0 = bitcast %struct.S0* %l_15 to i8*
		call void @llvm.memcpy.p0i8.p0i8.i64(i8* %r0, i8* bitcast ({ i32, i8, i8, i8 }* @main.l_15 to i8*), i64 8, i32 4, i1 false)
		%f0 = getelementptr inbounds %struct.S0, %struct.S0* %l_15, i32 0, i32 0
		%r1 = load i32, i32* %f0, align 4
		%r2 = bitcast %struct.S0* %l_15 to i64*
		%r3 = load i64, i64* %r2, align 1
		ret i64 %r3
		; CHECK: %[[ld_f0:.]] = load i32, i32 getelementptr inbounds ({ i32, i8, i8, i8 }, { i32, i8, i8, i8 }* @main.l_15, i64 0, i32 0), align 4
		; CHECK-NEXT: %[[ld_f1:.]] = load i24, i24 bitcast (i8* getelementptr inbounds ({ i32, i8, i8, i8 }, { i32, i8, i8, i8 }* @main.l_15, i64 0, i32 1) to i24*), align 4
		; CHECK-NEXT: %[[ld:.]] = load i8, i8 getelementptr inbounds (i8, i8* bitcast ({ i32, i8, i8, i8 }* @main.l_15 to i8*), i64 7), align 1
		; CHECK-NEXT: %[[ext:.*]] = zext i24 %[[ld_f1]] to i64
		; CHECK-NEXT: %[[shift:.*]] = shl i64 %[[ext]], 8
		; CHECK-NEXT: %[[mask:.*]] = and i64 undef, -4294967041
		; CHECK-NEXT: %[[insert:.*]] = or i64 %[[mask]], %[[shift]]
		; CHECK-NEXT: %[[ext1:.*]] = zext i32 %[[ld_f0]] to i64
		; CHECK-NEXT: %[[shift1:.*]] = shl i64 %[[ext1]], 32
		; CHECK-NEXT: %[[mask1:.*]] = and i64 %[[insert]], 4294967295
		; CHECK-NEXT: %[[insert1:.*]] = or i64 %[[mask1]], %[[shift1]]
		; CHECK-NEXT: ret i64 %[[insert1]]
		}