This is an archive of the discontinued LLVM Phabricator instance.

[LSR] Don't force bases of foldable formulae to the final type.
ClosedPublic

Authored by ebevhan on Jan 16 2018, 7:01 AM.

Download Raw Diff

Details

Reviewers

atrick
qcolombet
sanjoy

Commits

rG6d06976e74dc: [LSR] Don't force bases of foldable formulae to the final type.
rL323946: [LSR] Don't force bases of foldable formulae to the final type.

Summary

Before emitting code for scaled registers, we prevent
SCEVExpander from hoisting any scaled addressing mode
by emitting all the bases first. However, these bases
are being forced to the final type, resulting in some
odd code.

For example, if the type of the base is an integer and
the final type is a pointer, we will emit an inttoptr
for the base, a ptrtoint for the scale, and then a
'reverse' GEP where the GEP pointer is actually the base
integer and the index is the pointer. It's more intuitive
to use the pointer as a pointer and the integer as index.

Diff Detail

Event Timeline

ebevhan created this revision.Jan 16 2018, 7:01 AM

Hi,

Could you upload the patch with the full context?

I'd like to check the type is consistent with the surrounding code.

Cheers,
-Quentin

Included the complete context. Sorry for missing that!

qcolombet added inline comments.Jan 26 2018, 3:28 PM

lib/Transforms/Scalar/LoopStrengthReduce.cpp
4996	I don't think this is what we want, FullV is supposed to have the same type as F. That said, I would have expected your test case to have been recognized by the check line 4948. Could you check why this is not happening?

Out-of-curiosity do you actually see codegen differences with that patch?

lib/Transforms/Scalar/LoopStrengthReduce.cpp
5012	I believe we would have the same problem here.
5022	And here.

Basically, I am fine either way (with or without the cast), but we would need to be consistent.

In D42103#989687, @qcolombet wrote:

Out-of-curiosity do you actually see codegen differences with that patch?

The only difference in the test case is that the base and scale registers on the memory operation are swapped.

In our downstream target we see fairly large differences because our pointers and integers go in different registers, so it's inefficient to do a GEP like this where the base and scale are 'swapped'.

lib/Transforms/Scalar/LoopStrengthReduce.cpp
4996	In the test case in this patch (and it's probably the same for the local test case I found this in originally), the check on 4948 does trigger. Ty is i64 and OpTy is i8. Since the effective SCEV type for both of these is i64, Ty is set to i8 and the base (which is originally an integer) is converted into a pointer when it's expanded.

Alright, LGTM then.
Please also patches the two other places I mentioned assuming you can have a test case for them.

lib/Transforms/Scalar/LoopStrengthReduce.cpp
4996	Thanks for double checking.

This revision is now accepted and ready to land.Jan 29 2018, 4:15 PM

I replaced the other two instances of Ty with nullptr, and only observed a difference in one AMDGPU test case (preserve-addrspace-assert.ll), likely caused by the change on line 5022. Before, GEPs for double* and i32* were produced, but with the patch, GEPs for the containing struct are produced instead. This is probably because the expander was being forced to emit the pre-offset expressions as the final types (which were double* and i32*) but with the patch it will emit GEPs that take the struct pointer instead.

It's not really clear if this is better or not, and I can't really say that I have a test case that represents this change specifically. Should I upload a diff with the fixes to the test case or leave this patch as is?

Let's leave the patch as is.

Rebased. Another test case was affected, but as far as I can tell, the only difference is the elimination of two bitcasts.

LGTM

Closed by commit rL323946: [LSR] Don't force bases of foldable formulae to the final type. (authored by uabelho). · Explain WhyJan 31 2018, 10:40 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Transforms/

Scalar/

LoopStrengthReduce.cpp

2 lines

test/

Transforms/

LoopStrengthReduce/

X86/

nested-loop.ll

5 lines

Diff 130837

lib/Transforms/Scalar/LoopStrengthReduce.cpp

Show First 20 Lines • Show All 4,987 Lines • ▼ Show 20 Lines	if (F.Scale != 0) {
} else {		} else {
// Otherwise just expand the scaled register and an explicit scale,		// Otherwise just expand the scaled register and an explicit scale,
// which is expected to be matched as part of the address.		// which is expected to be matched as part of the address.

// Flush the operand list to suppress SCEVExpander hoisting address modes.		// Flush the operand list to suppress SCEVExpander hoisting address modes.
// Unless the addressing mode will not be folded.		// Unless the addressing mode will not be folded.
if (!Ops.empty() && LU.Kind == LSRUse::Address &&		if (!Ops.empty() && LU.Kind == LSRUse::Address &&
isAMCompletelyFolded(TTI, LU, F)) {		isAMCompletelyFolded(TTI, LU, F)) {
Value *FullV = Rewriter.expandCodeFor(SE.getAddExpr(Ops), Ty);		Value *FullV = Rewriter.expandCodeFor(SE.getAddExpr(Ops), nullptr);
		qcolombetUnsubmitted Not Done Reply Inline Actions I don't think this is what we want, FullV is supposed to have the same type as F. That said, I would have expected your test case to have been recognized by the check line 4948. Could you check why this is not happening? qcolombet: I don't think this is what we want, FullV is supposed to have the same type as F. That said, I…
		ebevhanAuthorUnsubmitted Not Done Reply Inline Actions In the test case in this patch (and it's probably the same for the local test case I found this in originally), the check on 4948 does trigger. Ty is i64 and OpTy is i8. Since the effective SCEV type for both of these is i64, Ty is set to i8 and the base (which is originally an integer) is converted into a pointer when it's expanded. ebevhan: In the test case in this patch (and it's probably the same for the local test case I found this…
		qcolombetUnsubmitted Not Done Reply Inline Actions Thanks for double checking. qcolombet: Thanks for double checking.
Ops.clear();		Ops.clear();
Ops.push_back(SE.getUnknown(FullV));		Ops.push_back(SE.getUnknown(FullV));
}		}
ScaledS = SE.getUnknown(Rewriter.expandCodeFor(ScaledS, nullptr));		ScaledS = SE.getUnknown(Rewriter.expandCodeFor(ScaledS, nullptr));
if (F.Scale != 1)		if (F.Scale != 1)
ScaledS =		ScaledS =
SE.getMulExpr(ScaledS, SE.getConstant(ScaledS->getType(), F.Scale));		SE.getMulExpr(ScaledS, SE.getConstant(ScaledS->getType(), F.Scale));
Ops.push_back(ScaledS);		Ops.push_back(ScaledS);
}		}
}		}

// Expand the GV portion.		// Expand the GV portion.
if (F.BaseGV) {		if (F.BaseGV) {
// Flush the operand list to suppress SCEVExpander hoisting.		// Flush the operand list to suppress SCEVExpander hoisting.
if (!Ops.empty()) {		if (!Ops.empty()) {
Value *FullV = Rewriter.expandCodeFor(SE.getAddExpr(Ops), Ty);		Value *FullV = Rewriter.expandCodeFor(SE.getAddExpr(Ops), Ty);
qcolombetUnsubmitted Not Done Reply Inline Actions I believe we would have the same problem here. qcolombet: I believe we would have the same problem here.
Ops.clear();		Ops.clear();
Ops.push_back(SE.getUnknown(FullV));		Ops.push_back(SE.getUnknown(FullV));
}		}
Ops.push_back(SE.getUnknown(F.BaseGV));		Ops.push_back(SE.getUnknown(F.BaseGV));
}		}

// Flush the operand list to suppress SCEVExpander hoisting of both folded and		// Flush the operand list to suppress SCEVExpander hoisting of both folded and
// unfolded offsets. LSR assumes they both live next to their uses.		// unfolded offsets. LSR assumes they both live next to their uses.
if (!Ops.empty()) {		if (!Ops.empty()) {
Value *FullV = Rewriter.expandCodeFor(SE.getAddExpr(Ops), Ty);		Value *FullV = Rewriter.expandCodeFor(SE.getAddExpr(Ops), Ty);
qcolombetUnsubmitted Not Done Reply Inline Actions And here. qcolombet: And here.
Ops.clear();		Ops.clear();
Ops.push_back(SE.getUnknown(FullV));		Ops.push_back(SE.getUnknown(FullV));
}		}

// Expand the immediate portion.		// Expand the immediate portion.
int64_t Offset = (uint64_t)F.BaseOffset + LF.Offset;		int64_t Offset = (uint64_t)F.BaseOffset + LF.Offset;
if (Offset != 0) {		if (Offset != 0) {
if (LU.Kind == LSRUse::ICmpZero) {		if (LU.Kind == LSRUse::ICmpZero) {
▲ Show 20 Lines • Show All 488 Lines • Show Last 20 Lines

test/Transforms/LoopStrengthReduce/X86/nested-loop.ll

Show All 23 Lines	for.body2.preheader: ; preds = %for.body
br label %for.body2		br label %for.body2

; Check LSR only generates two induction variables for for.body2 one for compare and		; Check LSR only generates two induction variables for for.body2 one for compare and
; one to shared by multiple array accesses.		; one to shared by multiple array accesses.
; CHECK: for.body2:		; CHECK: for.body2:
; CHECK-NEXT: [[LSRAR:%[^,]+]] = phi i8* [ %scevgep, %for.body2 ], [ %maxarray, %for.body2.preheader ]		; CHECK-NEXT: [[LSRAR:%[^,]+]] = phi i8* [ %scevgep, %for.body2 ], [ %maxarray, %for.body2.preheader ]
; CHECK-NEXT: [[LSR:%[^,]+]] = phi i64 [ %lsr.iv.next, %for.body2 ], [ %0, %for.body2.preheader ]		; CHECK-NEXT: [[LSR:%[^,]+]] = phi i64 [ %lsr.iv.next, %for.body2 ], [ %0, %for.body2.preheader ]
; CHECK-NOT: = phi i64 [ {{.}}, %for.body2 ], [ {{.}}, %for.body2.preheader ]		; CHECK-NOT: = phi i64 [ {{.}}, %for.body2 ], [ {{.}}, %for.body2.preheader ]
; CHECK: [[LSRINT:%[^,]+]] = ptrtoint i8* [[LSRAR]] to i64
; CHECK: [[SCEVGEP1:%[^,]+]] = getelementptr i8, i8* [[LSRAR]], i64 1		; CHECK: [[SCEVGEP1:%[^,]+]] = getelementptr i8, i8* [[LSRAR]], i64 1
; CHECK: {{.}} = load i8, i8 [[SCEVGEP1]], align 1		; CHECK: {{.}} = load i8, i8 [[SCEVGEP1]], align 1
; CHECK: [[SCEVGEP2:%[^,]+]] = getelementptr i8, i8* %1, i64 [[LSRINT]]		; CHECK: [[SCEVGEP2:%[^,]+]] = getelementptr i8, i8* [[LSRAR]], i64 %0
; CHECK: {{.}} = load i8, i8 [[SCEVGEP2]], align 1		; CHECK: {{.}} = load i8, i8 [[SCEVGEP2]], align 1
; CHECK: [[SCEVGEP3:%[^,]+]] = getelementptr i8, i8* {{.*}}, i64 [[LSRINT]]		; CHECK: [[SCEVGEP3:%[^,]+]] = getelementptr i8, i8* [[LSRAR]], i64 {{.*}}
; CHECK: store i8 {{.}}, i8 [[SCEVGEP3]], align 1		; CHECK: store i8 {{.}}, i8 [[SCEVGEP3]], align 1
; CHECK: [[LSRNEXT:%[^,]+]] = add i64 [[LSR]], -1		; CHECK: [[LSRNEXT:%[^,]+]] = add i64 [[LSR]], -1
; CHECK: %exitcond = icmp ne i64 [[LSRNEXT]], 0		; CHECK: %exitcond = icmp ne i64 [[LSRNEXT]], 0
; CHECK: br i1 %exitcond, label %for.body2, label %for.inc.loopexit		; CHECK: br i1 %exitcond, label %for.body2, label %for.inc.loopexit

for.body2: ; preds = %for.body2.preheader, %for.body2		for.body2: ; preds = %for.body2.preheader, %for.body2
%indvars.iv = phi i64 [ 1, %for.body2.preheader ], [ %indvars.iv.next, %for.body2 ]		%indvars.iv = phi i64 [ 1, %for.body2.preheader ], [ %indvars.iv.next, %for.body2 ]
%arrayidx1 = getelementptr inbounds i8, i8* %maxarray, i64 %indvars.iv		%arrayidx1 = getelementptr inbounds i8, i8* %maxarray, i64 %indvars.iv
Show All 24 Lines