This is an archive of the discontinued LLVM Phabricator instance.

[IVUser] Add a trunc if used only by a single IV user
Needs ReviewPublic

Authored by junbuml on Jun 15 2016, 12:46 PM.

Details

Summary

Add a trunc as an IV user if it is used only by a single IV user that is added
directly through the trunc and there are other IV users that has the same SCEV
recurrence as the trunc. By adding such trunc, we expect that LSR chooses a
better solution in case where there are other IV users which have the same SCEV
recurrence as the trunc.

Motivating case in AArch64 is added as a testcase where extra IV increments are
added in LSR which picked suboptimal formulae.

For the loop :

for.body.preheader:
  %n_sext = sext i32 %n to i64
  br label %for.body 
for.body:
  %K.in = phi i64 [ %n_sext, %for.body.preheader ], [ %K, %for.body ]
  %K = add i64 %K.in, 1
  %StoredAddr = getelementptr inbounds [12 x i32], [12 x i32]* @gvarray, i64 0, i64 %K
  %StoredValue = trunc i64 %K to i32
  store volatile i32 %StoredValue, i32* %StoredAddr
  %cmp = icmp sgt i64 %K, 1
  br i1 %cmp, label %for.body, label %for.end

Before this change :

.LBB0_2:
  add x11, x9, x10, lsl #2
  add x10, x10, #1
  str w8, [x11, #4]
  add w8, w8, #1
  cmp x10, #1
  b.gt .LBB0_2

After this change :

.LBB0_2:
add	x10, x8, #1
cmp		x8, #1
str	w8, [x9, x8, lsl #2]
mov	 x8, x10
b.gt	.LBB0_2

Diff Detail

Event Timeline

junbuml updated this revision to Diff 60880.Jun 15 2016, 12:46 PM
junbuml retitled this revision from to [LSR] Use post-inc expression for post-inc user.
junbuml updated this object.
junbuml updated this revision to Diff 60884.Jun 15 2016, 12:53 PM
junbuml added a subscriber: llvm-commits.
atrick edited edge metadata.Jun 16 2016, 4:31 PM

This is definitely a case worth fixing.

The "problem" with this case from LSR point of view is that SCEV is
too smart and hoists the 'trunc' out of the recurrence. This makes
it look like the expression that feeds the store and the compare are
based on different initial values.

This patch looks like it goes through a fairly involved analysis
before LSR to undo the normalization that happens during IVUsers. I
think that only obscures the input to LSR. It happens to generate
better code in this case, but LSR still thinks that two registers are
needed for the loop counter. So LSR hasn't really been fixed, and I'm
afraid this will prevent LSR from optimizing other cases.

Look at the output of LSR *after* applying this patch:

The chosen solution requires 3 regs, with addrec cost 2, plus 1 scale cost, plus 1 setup cost:

LSR Use: Kind=Basic, Offsets={0}, widest fixup type: i64
  reg({(sext i32 %n to i64),+,1}<%for.body>)
LSR Use: Kind=Basic, Offsets={0}, widest fixup type: i32
  reg({%n,+,1}<%for.body>)
LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup type: i32*
  reg(@gvarray) + 4*reg({(sext i32 %n to i64),+,1}<%for.body>)

So the problem has not been fixed in LSR. We just get lucky and "replaceCongruentPhis" happens to cleanup the extra IV:

INDVARS: Eliminated congruent iv.inc: %lsr.iv.next = add i32 %lsr.iv, 1
INDVARS: Eliminated congruent iv: %lsr.iv = phi i32 [ %lsr, %for.body ], [ %n, %for.body.preheader ]

I have two suggestions. I'm not sure either will work:

(1) Fix IVUsers so that the truncate is itself the IV users. Maybe do this only if the trunc has a single user and/or the use it itself not an "interesting" SCEV. Then LSR will see the same SCEV recurrence for all users. This will be easy, but I don't know if it will mess up any other cases.

(2) Improve replaceCongruentPhis to cleanup these sort of off-by-one-with-postinc-user phis. This requires a fair amount of pattern matching.

In any case, the important, and time-consuming part will be evaluating AArch64 performance on the LLVM test suite after the change. Although if there are few changes, it may be sufficient just to look at the disassembly diffs to see that they're reasonable.

Actually, I'm sure fix #1 works in this case, I'm just not sure it won't regress anything else.

junbuml updated this revision to Diff 61822.Jun 24 2016, 2:07 PM
junbuml retitled this revision from [LSR] Use post-inc expression for post-inc user to [IVUser] Add a trunc if used only by a single IV user.
junbuml updated this object.
junbuml edited edge metadata.

(1) Fix IVUsers so that the truncate is itself the IV users. Maybe do this only if the trunc has a single user and/or the use it itself not an "interesting" SCEV. Then LSR will see the same SCEV recurrence for all users. This will be easy, but I don't know if it will mess up any other cases.

Based on Andrew's suggestion #1, updated the patch and summary.

Looked at the disassembly diffs in spec2000/2006. This patch is not applied in spec2000, but applied in 6 benchmarks (hmmer,h264ref, omnetpp, gobmk, sphinx3, and gcc) in Spec2006. Except gcc in which one more add instruction is added in a loop, all diffs looks reasonable to me as it remove more instructions. I will continue running performance tests and update with them.

Does this patch still need review, or has it been subsumed by other changes?

Does this patch still need review, or has it been subsumed by other changes?

I deprioritized this because I didn't see clear performance gains in benchmarks I was targeting. However, there was no other changes covering this issue, and this change is still good to be reviewed.

It would be great to fix the example shown in the summary. However, I'm concerned that the proposed heuristic could hurt codegen in other cases.The cases where this heuristic generates worse code should be identified to determine whether this is the right strategy.

Maybe this should just be a bug report since the patch is in limbo?

Thanks Andrew for reviewing this old change. Let me try to check any code change with/without this patch in llvmTS and spec2000/2006. I added a comment in Bugzilla (https://llvm.org/bugs/show_bug.cgi?id=26913), mentioning this change.

dexonsmith resigned from this revision.Oct 19 2020, 6:53 PM