If a scale or a base register can be rewritten as "Zext({A,+,1})" then
LSR will now consider a formula of that form in its normal cost
computation.
Depends on D9180
Differential D9181
[LSR] Generate and use zero extends sanjoy on Apr 21 2015, 6:44 PM. Authored by
Details If a scale or a base register can be rewritten as "Zext({A,+,1})" then Depends on D9180
Diff Detail
Event TimelineComment Actions This looks reasonable. Thanks! Does your test fail on trunk? This is what the test outputs for me: ok_161: ; preds = %ok_158 %lsr.iv.next = add nuw nsw i64 %lsr.iv, 1 %4 = add i64 %0, %lsr.iv.next %tmp1 = trunc i64 %4 to i32 %tmp188 = icmp slt i32 %tmp1, %tmp160 br i1 %tmp188, label %ok_146, label %block_81 Would you be able to test both post-inc and pre-inc variants? We've a lot of bugs because of TransformForPostIncUse.
Comment Actions
That's behavior I'm trying to avoid -- I don't want two additions on To elaborate a little bit, here is the -debug-only=loop-reduce on the ''' LSR Use: Kind=Basic, Offsets={0}, widest fixup type: i32 reg({%tmp156,+,1}<nuw><nsw><%ok_146>) reg((zext i32 %tmp156 to i64)) + 1*reg({2,+,1}<nw><%ok_146>) + imm(-2) reg((zext i32 %tmp156 to i64)) + 1*reg({0,+,1}<nuw><%ok_146>) reg((zext i32 %tmp156 to i64)) + 1*reg({-1,+,1}<nw><%ok_146>) + imm(1) reg((zext i32 %tmp156 to i64)) + 1*reg({3,+,1}<nw><%ok_146>) + imm(-3) LSR Use: Kind=Basic, Offsets={0}, all-fixups-outside-loop, widest fixup type: i64 reg({(-1 + (zext i32 %tmp156 to i64)),+,1}<nw><%ok_146>) reg({(zext i32 %tmp156 to i64),+,1}<nuw><nsw><%ok_146>) + imm(-1) reg((zext i32 %tmp156 to i64)) + 1*reg({0,+,1}<nuw><%ok_146>) + imm(-1) reg((zext i32 %tmp156 to i64)) + 1*reg({-1,+,1}<nw><%ok_146>) reg((-1 + (zext i32 %tmp156 to i64))) + 1*reg({0,+,1}<nuw><%ok_146>) reg({(3 + (zext i32 %tmp156 to i64)),+,1}<nw><%ok_146>) + imm(-4) reg({(2 + (zext i32 %tmp156 to i64)),+,1}<nw><%ok_146>) + imm(-3) LSR Use: Kind=Address of double, Offsets={16}, widest fixup type: double* -16 + reg({(16 + (8 * (zext i32 %tmp156 to i64)) + %d),+,8}<nw><%ok_146>) -24 + reg(((8 * (zext i32 %tmp156 to i64)) + %d)) + 4*reg({6,+,2}<%ok_146>) -16 + reg(((8 * (zext i32 %tmp156 to i64)) + %d)) + 2*reg({8,+,4}<%ok_146>) -24 + reg(((8 * (zext i32 %tmp156 to i64)) + %d)) + 2*reg({12,+,4}<%ok_146>) -16 + reg(%d) + 8*reg({(2 + (zext i32 %tmp156 to i64)),+,1}<nw><%ok_146>) -24 + reg(%d) + 8*reg({(3 + (zext i32 %tmp156 to i64)),+,1}<nw><%ok_146>) -8 + reg((16 + %d)<nsw>) + 8*reg({(-1 + (zext i32 %tmp156 to i64)),+,1}<nw><%ok_146>) -16 + reg(((8 * (zext i32 %tmp156 to i64)) + %d)) + 4*reg({4,+,2}<%ok_146>) -16 + reg(((8 * (zext i32 %tmp156 to i64)) + %d)) + 8*reg({2,+,1}<nw><%ok_146>) -16 + reg((16 + (8 * (zext i32 %tmp156 to i64)) + %d)) + 2*reg({0,+,4}<%ok_146>) -16 + reg((16 + (8 * (zext i32 %tmp156 to i64)) + %d)) + 4*reg({0,+,2}<%ok_146>) -16 + reg((16 + (8 * (zext i32 %tmp156 to i64)) + %d)) + 8*reg({0,+,1}<nuw><%ok_146>) -16 + reg((16 + %d)<nsw>) + 2*reg({(4 * (zext i32 %tmp156 to i64)),+,4}<nuw><nsw><%ok_146>) -16 + reg((16 + %d)<nsw>) + 4*reg({(2 * (zext i32 %tmp156 to i64)),+,2}<%ok_146>) -16 + reg((16 + %d)<nsw>) + 8*reg({(zext i32 %tmp156 to i64),+,1}<nuw><nsw><%ok_146>) LSR Use: Kind=Address of i32, Offsets={12}, widest fixup type: i32* -12 + reg({(12 + (4 * (zext i32 %tmp156 to i64)) + %b),+,4}<nw><%ok_146>) -20 + reg((12 + (4 * (zext i32 %tmp156 to i64)) + %b)) + 2*reg({4,+,2}<%ok_146>) -12 + reg((12 + (4 * (zext i32 %tmp156 to i64)) + %b)) + 1*reg({0,+,4}<%ok_146>) -12 + reg((12 + %b)<nsw>) + 1*reg({(4 * (zext i32 %tmp156 to i64)),+,4}<nuw><nsw><%ok_146>) -12 + reg(((4 * (zext i32 %tmp156 to i64)) + %b)) + 2*reg({6,+,2}<%ok_146>) -20 + reg((12 + (4 * (zext i32 %tmp156 to i64)) + %b)) + 1*reg({8,+,4}<%ok_146>) -12 + reg(((4 * (zext i32 %tmp156 to i64)) + %b)) + 1*reg({12,+,4}<%ok_146>) -8 + reg(%b) + 4*reg({(2 + (zext i32 %tmp156 to i64)),+,1}<nw><%ok_146>) -12 + reg(%b) + 4*reg({(3 + (zext i32 %tmp156 to i64)),+,1}<nw><%ok_146>) -8 + reg((12 + %b)<nsw>) + 4*reg({(-1 + (zext i32 %tmp156 to i64)),+,1}<nw><%ok_146>) -12 + reg(((4 * (zext i32 %tmp156 to i64)) + %b)) + 4*reg({3,+,1}<nw><%ok_146>) -12 + reg((12 + (4 * (zext i32 %tmp156 to i64)) + %b)) + 2*reg({0,+,2}<%ok_146>) -12 + reg((12 + (4 * (zext i32 %tmp156 to i64)) + %b)) + 4*reg({0,+,1}<nuw><%ok_146>) -12 + reg((12 + %b)<nsw>) + 2*reg({(2 * (zext i32 %tmp156 to i64)),+,2}<%ok_146>) -12 + reg((12 + %b)<nsw>) + 4*reg({(zext i32 %tmp156 to i64),+,1}<nuw><nsw><%ok_146>) The chosen solution requires 4 regs, with addrec cost 1, plus 3 base LSR Use: Kind=Basic, Offsets={0}, widest fixup type: i32 reg((zext i32 %tmp156 to i64)) + 1*reg({0,+,1}<nuw><%ok_146>) LSR Use: Kind=Basic, Offsets={0}, all-fixups-outside-loop, widest fixup type: i64 reg((zext i32 %tmp156 to i64)) + 1*reg({0,+,1}<nuw><%ok_146>) + imm(-1) LSR Use: Kind=Address of double, Offsets={16}, widest fixup type: double* -16 + reg((16 + (8 * (zext i32 %tmp156 to i64)) + %d)) + 8*reg({0,+,1}<nuw><%ok_146>) LSR Use: Kind=Address of i32, Offsets={12}, widest fixup type: i32* -12 + reg((12 + (4 * (zext i32 %tmp156 to i64)) + %b)) + 4*reg({0,+,1}<nuw><%ok_146>) And here is the debug output after the change. LSR now "sees" a few ''' LSR Use: Kind=Basic, Offsets={0}, widest fixup type: i32 reg({%tmp156,+,1}<nuw><nsw><%ok_146>) reg(%tmp156) + 1*reg({0,+,1}<nuw><%ok_146>) reg((zext i32 %tmp156 to i64)) + 1*reg({0,+,1}<nuw><%ok_146>) reg((zext i32 %tmp156 to i64)) + 1*reg({-1,+,1}<nw><%ok_146>) + imm(1) reg((zext i32 %tmp156 to i64)) + 1*reg({3,+,1}<nw><%ok_146>) + imm(-3) reg((zext i32 %tmp156 to i64)) + 1*reg({2,+,1}<nw><%ok_146>) + imm(-2) LSR Use: Kind=Basic, Offsets={0}, all-fixups-outside-loop, widest fixup type: i64 reg({(-1 + (zext i32 %tmp156 to i64)),+,1}<nw><%ok_146>) reg({(zext i32 %tmp156 to i64),+,1}<nuw><nsw><%ok_146>) + imm(-1) reg((zext i32 %tmp156 to i64)) + 1*reg({0,+,1}<nuw><%ok_146>) + imm(-1) reg((zext i32 %tmp156 to i64)) + 1*reg({-1,+,1}<nw><%ok_146>) reg((-1 + (zext i32 %tmp156 to i64))) + 1*reg({0,+,1}<nuw><%ok_146>) reg({%tmp156,+,1}<nuw><nsw><%ok_146>) + imm(-1) <- new solution +++++ reg({(3 + (zext i32 %tmp156 to i64)),+,1}<nw><%ok_146>) + imm(-4) reg({(2 + (zext i32 %tmp156 to i64)),+,1}<nw><%ok_146>) + imm(-3) LSR Use: Kind=Address of double, Offsets={16}, widest fixup type: double* -16 + reg({(16 + (8 * (zext i32 %tmp156 to i64)) + %d),+,8}<nw><%ok_146>) -24 + reg(((8 * (zext i32 %tmp156 to i64)) + %d)) + 4*reg({6,+,2}<%ok_146>) -16 + reg(((8 * (zext i32 %tmp156 to i64)) + %d)) + 2*reg({8,+,4}<%ok_146>) -24 + reg(((8 * (zext i32 %tmp156 to i64)) + %d)) + 2*reg({12,+,4}<%ok_146>) -16 + reg(%d) + 8*reg({(2 + (zext i32 %tmp156 to i64)),+,1}<nw><%ok_146>) -24 + reg(%d) + 8*reg({(3 + (zext i32 %tmp156 to i64)),+,1}<nw><%ok_146>) -8 + reg((16 + %d)<nsw>) + 8*reg({(-1 + (zext i32 %tmp156 to i64)),+,1}<nw><%ok_146>) -16 + reg((16 + %d)<nsw>) + 8*reg({%tmp156,+,1}<nuw><nsw><%ok_146>) <- new solution +++++ -16 + reg(((8 * (zext i32 %tmp156 to i64)) + %d)) + 4*reg({4,+,2}<%ok_146>) -16 + reg(((8 * (zext i32 %tmp156 to i64)) + %d)) + 8*reg({2,+,1}<nw><%ok_146>) -16 + reg((16 + (8 * (zext i32 %tmp156 to i64)) + %d)) + 2*reg({0,+,4}<%ok_146>) -16 + reg((16 + (8 * (zext i32 %tmp156 to i64)) + %d)) + 4*reg({0,+,2}<%ok_146>) -16 + reg((16 + (8 * (zext i32 %tmp156 to i64)) + %d)) + 8*reg({0,+,1}<nuw><%ok_146>) -16 + reg((16 + %d)<nsw>) + 2*reg({(4 * (zext i32 %tmp156 to i64)),+,4}<nuw><nsw><%ok_146>) -16 + reg((16 + %d)<nsw>) + 4*reg({(2 * (zext i32 %tmp156 to i64)),+,2}<%ok_146>) -16 + reg((16 + %d)<nsw>) + 8*reg({(zext i32 %tmp156 to i64),+,1}<nuw><nsw><%ok_146>) LSR Use: Kind=Address of i32, Offsets={12}, widest fixup type: i32* -12 + reg({(12 + (4 * (zext i32 %tmp156 to i64)) + %b),+,4}<nw><%ok_146>) -20 + reg((12 + (4 * (zext i32 %tmp156 to i64)) + %b)) + 2*reg({4,+,2}<%ok_146>) -12 + reg((12 + (4 * (zext i32 %tmp156 to i64)) + %b)) + 1*reg({0,+,4}<%ok_146>) -12 + reg((12 + %b)<nsw>) + 1*reg({(4 * (zext i32 %tmp156 to i64)),+,4}<nuw><nsw><%ok_146>) -12 + reg(((4 * (zext i32 %tmp156 to i64)) + %b)) + 2*reg({6,+,2}<%ok_146>) -20 + reg((12 + (4 * (zext i32 %tmp156 to i64)) + %b)) + 1*reg({8,+,4}<%ok_146>) -12 + reg(((4 * (zext i32 %tmp156 to i64)) + %b)) + 1*reg({12,+,4}<%ok_146>) -8 + reg(%b) + 4*reg({(2 + (zext i32 %tmp156 to i64)),+,1}<nw><%ok_146>) -12 + reg(%b) + 4*reg({(3 + (zext i32 %tmp156 to i64)),+,1}<nw><%ok_146>) -8 + reg((12 + %b)<nsw>) + 4*reg({(-1 + (zext i32 %tmp156 to i64)),+,1}<nw><%ok_146>) -12 + reg((12 + %b)<nsw>) + 4*reg({%tmp156,+,1}<nuw><nsw><%ok_146>) <- new solution +++++ -12 + reg(((4 * (zext i32 %tmp156 to i64)) + %b)) + 4*reg({3,+,1}<nw><%ok_146>) -12 + reg((12 + (4 * (zext i32 %tmp156 to i64)) + %b)) + 2*reg({0,+,2}<%ok_146>) -12 + reg((12 + (4 * (zext i32 %tmp156 to i64)) + %b)) + 4*reg({0,+,1}<nuw><%ok_146>) -12 + reg((12 + %b)<nsw>) + 2*reg({(2 * (zext i32 %tmp156 to i64)),+,2}<%ok_146>) -12 + reg((12 + %b)<nsw>) + 4*reg({(zext i32 %tmp156 to i64),+,1}<nuw><nsw><%ok_146>) The chosen solution requires 3 regs, with addrec cost 1, plus 1 base LSR Use: Kind=Basic, Offsets={0}, widest fixup type: i32 reg({%tmp156,+,1}<nuw><nsw><%ok_146>) LSR Use: Kind=Basic, Offsets={0}, all-fixups-outside-loop, widest fixup type: i64 reg({%tmp156,+,1}<nuw><nsw><%ok_146>) + imm(-1) LSR Use: Kind=Address of double, Offsets={16}, widest fixup type: double* -16 + reg((16 + %d)<nsw>) + 8*reg({%tmp156,+,1}<nuw><nsw><%ok_146>) LSR Use: Kind=Address of i32, Offsets={12}, widest fixup type: i32* -12 + reg((12 + %b)<nsw>) + 4*reg({%tmp156,+,1}<nuw><nsw><%ok_146>) ''' The new formulae lets LSR choose a better solution. Looking at the output tells me that I should definitely add something
I think that is a good idea. I'm not familiar with LSR so I suspect It also looks like (I'm not a 100% sure) that there's an inconsistency The problem is that TransformForPostIncUse(zext({S,+,X})) = Now if {S,+,1} is nuw then {zext S,+,1} = zext({S,+,1}), but
Comment Actions address review:
|