Page MenuHomePhabricator

[IndVarSimplify] Extend previous special case for load use instruction to any narrow type loop variant to avoid extra trunc instruction
ClosedPublic

Authored by zhongduo on Jan 20 2020, 11:11 AM.

Details

Summary

The widenIVUse avoids generating trunc by evaluating the use as AddRec, this
will not work when:

  1. SCEV traces back to an instruction inside the loop that SCEV can not

expand, eg. add %indvar, (load %addr)

  1. SCEV finds a loop variant, eg. add %indvar, %loopvariant

While SCEV fails to avoid trunc, we can still try to use instruction
combining approach to prove trunc is not required. This can be further
extended with other instruction combining checks, but for now we handle the
following case (sub can be "add" and "mul", "nsw + sext" can be "nus + zext")

Src:
  %c = sub nsw %b, %indvar
  %d = sext %c to i64
Dst:
  %indvar.ext1 = sext %indvar to i64
  %m = sext %b to i64
  %d = sub nsw i64 %m, %indvar.ext1

Therefore, as long as the result of add/sub/mul is extended to wide type with
right extension and overflow wrap combination, no
trunc is required regardless of how %b is generated. This pattern is common
when calculating address in 64 bit architecture.

Note that this patch reuse almost all the code from D49151 by @az:
https://reviews.llvm.org/D49151

It extends it by providing proof of why trunc is unnecessary in more general case,
it should also resolve some of the concerns from the following discussion with @reames.

http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20180910/585945.html

Diff Detail

Event Timeline

zhongduo created this revision.Jan 20 2020, 11:11 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 20 2020, 11:11 AM

@sanjoy @efriedma @sebpop @reames @az @javed.absar Could you please review and provide some feedback to this patch? I would highly appreciate your help.

amehsan accepted this revision.Mar 4 2020, 9:14 PM

@sanjoy @efriedma @sebpop @reames @az @javed.absar

Is there any concern about this patch? We have reviewed this internally and it looks good to me. It has been also tested extensively. The condition on the load instruction does not seem to have any role in correctness of the transformations and seems to be a completely arbitrary condition. I accept the patch for now, but I will wait and prefer to hear from one of the other reviewers before committing it.

This revision is now accepted and ready to land.Mar 4 2020, 9:14 PM
az added a comment.Mar 5 2020, 7:11 AM

The patch looks good to me. It is a generalization of the original patch. The only thing missing here is to describe the testing results or post them here (even though on paper, getting rid of a truncate instruction should be beneficial).

Some background: When I wrote the original patch, I had two versions: the one with the load restriction and the general one that is exactly what you have here. We tested both of them on Geekbench4 and SPEC 2000 on ARM (Exynos and A57). Both versions of the patch got around 10% improvement for one Geekbench kernel and with no change for the rest. That improvement was the reason for the patch. I chose to post the patch with the load restriction and limit the scope of the patch because we did not have the resource to test more and in particular test non ARM platforms.

zhongduo added a comment.EditedMar 5 2020, 9:11 AM

@az This problem was exposed in a small test case that we were working on. For that particular one, we have around 8% but it might be the result of preventing other optimization from happening. I didn't see significant impact with other larger test cases.

@az

Our pipeline has some differences with the default pass pipeline. The problem was exposed when playing with the pipeline. As Jimmy mentioned it is in one of the smaller benchmarks in the test suite. Overall impact is small, but on the other hand, we are not adding any compile time or any other kind of cost. So I don't see an issue. The remaining question is functional stability. The code looks quite correct to me. You have also looked into it in the past so I think on that issue we are fine. There is always a chance that we expose some other bug somewhere else. We have not observed that issue and given limited impact of the patch it is not very likely. Anyways, I think overall this should be fine to be merged. Still I wait a little more to see if any of the reviewers has any concern in the next couple of days.

az added a comment.Mar 5 2020, 10:43 AM

LGTM.

Since you discovered this problem with non-default pass pipeline, then I do not need to see your test. With the default llvm passes, it was easy for me to find tests with load where the patch can be beneficial because of limitation in alias analysis preventing some optimizations and this patch can clean up things. But, I was not able back then to find meaningful test with non-load where this patch can help. It is great that you have a test (with your non-default passes) where this patch can be useful. It seems that both of us tested this code very well on ARM. Also, theoretically removing an extra instruction should be beneficial for other architectures too.

az accepted this revision.Mar 5 2020, 10:47 AM
This revision was automatically updated to reflect the committed changes.