This is an archive of the discontinued LLVM Phabricator instance.

Failure to hoist constant out of loop
AbandonedPublic

Authored by avt77 on Sep 20 2016, 6:17 AM.

Details

Summary

Poor code gen identified in Andreas Fredriksson's GDC 2016 Talk 'Taming the Jaguar: x86 Optimization at Insomniac Games':
http://schedule.gdconf.com/session/taming-the-jaguar-x86-optimization-at-insomniac-games

(see details in PR27136)

Diff Detail

Event Timeline

avt77 updated this revision to Diff 71919.Sep 20 2016, 6:17 AM
avt77 retitled this revision from to Failure to hoist constant out of loop.
avt77 updated this object.
avt77 added a reviewer: ABataev.
avt77 updated this revision to Diff 72020.Sep 21 2016, 3:53 AM

I updated the patch after the first review from Alexey Bataev

avt77 updated this revision to Diff 72022.Sep 21 2016, 4:32 AM

Another Alexey's request resolved.

avt77 updated this revision to Diff 72051.Sep 21 2016, 8:32 AM

Fixed the test accordingly to rksimon request

avt77 updated this object.Sep 22 2016, 7:16 AM
avt77 updated this revision to Diff 72167.Sep 22 2016, 7:20 AM
avt77 added reviewers: djasper, qcolombet, hfinkel, reames.

Metadata removed from the test

avt77 updated this revision to Diff 73252.Oct 3 2016, 2:42 AM

The previously committed test was updated to mirror changes in CodeGen. Please, review this version.

hfinkel edited edge metadata.Oct 3 2016, 6:08 AM

This certainly seems like the right thing to do. Have you run the test suite, etc. to check for performance regressions?

avt77 added a comment.Oct 3 2016, 6:49 AM

This certainly seems like the right thing to do. Have you run the test suite, etc. to check for performance regressions?

No, I did not run any special test suite. Could you give me a hint how to do it?

avt77 added a comment.Oct 7 2016, 3:39 AM

This certainly seems like the right thing to do. Have you run the test suite, etc. to check for performance regressions?

All tests from llvm-test-suite passed and there are no performance degradations. I used the following command:

lnt runtest nt -v -sandbox SANDBOX --cc WORKSPACE/build/bin/clang --test-suite ~/llvm-test-suite

Is it enough or I should do more testing?

reames edited edge metadata.Oct 7 2016, 3:07 PM

A couple of small comments, but fair warning: I am not a qualified reviewer for this code and won't be able to sign off.

lib/CodeGen/MachineLICM.cpp
1083

If I understand the issue correctly, this might be better phrased in terms of this predicate. Essentially, we know that we can rematerialize the constant *without* creating a copy even if there is a PHI use. I can see why this follows for loop exit phis. I'm not quite as clear I follow for phi's inside the loop.

test/CodeGen/X86/loop-search.ll
59

Just to make sure I understand the case you're trying to fix, the previous code fell over on this instruction as it is a use of the instruction to materialize the constant right?

avt77 updated this revision to Diff 74228.Oct 11 2016, 4:03 AM
avt77 edited edge metadata.

Move the comment about hoisted constant into the proper place

avt77 added inline comments.Oct 11 2016, 4:04 AM
lib/CodeGen/MachineLICM.cpp
1083

I'm not sure I understand you. The idea is very simple and you noticed about: yes, we can rematerialize the constant without creating a copy that's why it's profitable to hoist it. Is it OK?

test/CodeGen/X86/loop-search.ll
59

It seems I put my comment in the wrong place: it should be moved from line 28 to line 33. I'll fix it asap.
The previous code prepared TRUE result (constant) inside loop body (see lines 18, 19 on the left). Now we materialize this constant outside the loop (see lines 32, 33 on the right).

avt77 added a comment.Oct 14 2016, 4:39 AM

I created a simple test to check the performance:
#include <stdint.h>
#include <stdbool.h>

#define N 1000000000
#define M 100000000000000000

bool search(long needle, const uint32_t *haystack, int count) {

for (int i = 0; i < count; ++i)
  if (needle == haystack[i])
    return true;
return false;

}

long a [N];

int main () {

int i;
for (i = 0; i < N; i++) {
  a[i] = i;
}
for (long j = 0; j < M; j++) {
  search (j % N, (const uint32_t *)&a, N);
}

}
I checked this test with and without my fix like here:

time ./search.o

The improvement is about 8%.

Hal, could you give me your LGTM on this patch?

hfinkel accepted this revision.Oct 14 2016, 7:14 AM
hfinkel edited edge metadata.

I created a simple test to check the performance:
#include <stdint.h>
#include <stdbool.h>

#define N 1000000000
#define M 100000000000000000

bool search(long needle, const uint32_t *haystack, int count) {

for (int i = 0; i < count; ++i)
  if (needle == haystack[i])
    return true;
return false;

}

long a [N];

int main () {

int i;
for (i = 0; i < N; i++) {
  a[i] = i;
}
for (long j = 0; j < M; j++) {
  search (j % N, (const uint32_t *)&a, N);
}

}
I checked this test with and without my fix like here:

time ./search.o

The improvement is about 8%.

Hal, could you give me your LGTM on this patch?

I created a simple test to check the performance:
#include <stdint.h>
#include <stdbool.h>

#define N 1000000000
#define M 100000000000000000

bool search(long needle, const uint32_t *haystack, int count) {

for (int i = 0; i < count; ++i)
  if (needle == haystack[i])
    return true;
return false;

}

long a [N];

int main () {

int i;
for (i = 0; i < N; i++) {
  a[i] = i;
}
for (long j = 0; j < M; j++) {
  search (j % N, (const uint32_t *)&a, N);
}

}
I checked this test with and without my fix like here:

time ./search.o

The improvement is about 8%.

Hal, could you give me your LGTM on this patch?

Okay; LGTM. Can you also put together a patch to add your test to the test-suite? That way we can track the performance.

This revision is now accepted and ready to land.Oct 14 2016, 7:14 AM
avt77 added a comment.Oct 14 2016, 8:09 AM

Sorry, don't understand: do you mean I should add my perf test into the patch? But it is not a LIT test, it's an application. How can I add it to the test suit? I'll do it with my follow-up patch (on the next week) but please explain me how I should do it.

Sorry, don't understand: do you mean I should add my perf test into the patch? But it is not a LIT test, it's an application. How can I add it to the test suit? I'll do it with my follow-up patch (on the next week) but please explain me how I should do it.

Applications get added to the test suite (not the lit-run regressions tests). You know how to run the test suite (you did it with lnt as you indicated above). You could submit a patch to add your test to https://llvm.org/svn/llvm-project/test-suite/trunk, probably by placing it in the SingleSource/UnitTests subdirectory. The patch gets directed to llvm-commits as with this one.

avt77 requested a review of this revision.Oct 19 2016, 12:03 AM
avt77 edited edge metadata.

kparzysz, not long ago you added test tail-dup-merge-loop-headers.ll (maybe you added some other tests as well at that time). And this test (plus some others) failed now if I add this patch to the trunk code. Could you review my really tiny patch and could you give me a hint what's wrong with your code if I apply this patch?

kparzysz edited edge metadata.Oct 19 2016, 6:50 AM
kparzysz added a subscriber: iteratee.

kparzysz, not long ago you added test tail-dup-merge-loop-headers.ll (maybe you added some other tests as well at that time). And this test (plus some others) failed now if I add this patch to the trunk code. Could you review my really tiny patch and could you give me a hint what's wrong with your code if I apply this patch?

That was actually Kyle Butt (@iteratee) who added that test.

For tail-dup-merge-loop-headers.ll

That is a layout test, and your change affected the final layout. Your change is innocuous because it's really checking for the presence/absence of blocks.

In general:
By my count, you have 5 other tests to look at. Please look carefully at them to see if you can understand the intent of the test, and if the resulting code is in keeping with the intent of the test. If so, feel free to change the test to match. If you can't, try to get someone to look at the specific test.

avt77 edited edge metadata.Oct 20 2016, 7:13 AM
avt77 added subscribers: ab, jroelofs, logan.
avt77 added a comment.Oct 20 2016, 7:22 AM

logan, jroelofs, ab, iteratee, tnorthover: this patch fails your tests:
atomic-cmpxchg
cmpxchg-idioms
ifcvt-rescan-diamonds
But it seems the new version is better than the current one. Could you review the newly generated code and allow me to fix the tests?

logan, jroelofs, ab, iteratee, tnorthover: this patch fails your tests:
atomic-cmpxchg
cmpxchg-idioms
ifcvt-rescan-diamonds
But it seems the new version is better than the current one. Could you review the newly generated code and allow me to fix the tests?

Mind uploading these as full-context diffs? Phab's way of showing them as raw files isn't particularly helpful reviewing / making comments.

for ifcvt-rescan-diamonds.ll Nothing is broken, the test just gets optimized away.

Change the 0 in the phi of %cond.end84 to a load and you'll be fine. Change the CHECK lines to match.

avt77 updated this revision to Diff 75885.Oct 26 2016, 7:35 AM

I've fixed all failed tests. Test owners please review my changes: they are rather small that's why it should not take a lot of time from you.

I don't see any changes to ifcvt-rescan-diamonds.ll

avt77 added a comment.Oct 27 2016, 1:04 AM

I don't see any changes to ifcvt-rescan-diamonds.ll

Yes, the problem has disappeared - don't know why. Sorry for boring.

logan added a subscriber: danielcdh.Nov 1 2016, 9:04 AM

Hi @avt77:

For atomic-cmpxchg.ll, I found that your output is the same as the version I originally committed. The test was updated by @danielcdh in D24818 / rL284757. You may wish to add him as reviewer.

Looks like the hoisting issue described in PR27136 is already fixed at head? My guess is that the fix is from https://reviews.llvm.org/rL284757

avt77 added a comment.Nov 2 2016, 3:03 AM

Looks like the hoisting issue described in PR27136 is already fixed at head? My guess is that the fix is from https://reviews.llvm.org/rL284757

Yes, you're right: the current trunk does not have the issue with test/CodeGen/X86/loop-search.ll. It was fixed. But the hosting issue is still here: this patch really improves code for several tests. Should we continue with this patch or we should simply close it because loop_search.ll was fixed?

kparzysz resigned from this revision.Nov 2 2016, 6:39 AM
kparzysz removed a reviewer: kparzysz.
danielcdh added inline comments.Nov 2 2016, 10:18 AM
test/CodeGen/X86/loop-search.ll
15

Looks like hoisting this instruction is not the best choice because it will be executed speculatively.

More specifically, if the search fails, i.e. the loop terminated when i==count (no early exit), the performance will be worse because the hoisted movb will be redundant. OTOH, if the loop terminated with early exit (needle == haystack[i]), there will be no redundancy, but the life range of ax will be much longer and overlaps with many other life-ranges. This will add extra burden to RA. So looks to me hoisting is an overall loss here?

ABataev resigned from this revision.Feb 13 2017, 11:44 AM

Abandon this? Its seems to be fixed in trunk - there are still some issues in PR27136 but no constant hoisting.

This revision is now accepted and ready to land.Apr 7 2018, 9:17 AM
RKSimon requested changes to this revision.Apr 7 2018, 9:18 AM

Stripping approval

This revision now requires changes to proceed.Apr 7 2018, 9:18 AM
avt77 abandoned this revision.Apr 9 2018, 12:26 AM

The trunk fixed the issue.