This is an archive of the discontinued LLVM Phabricator instance.

Fix DAGCombiner match
ClosedPublic

Authored by evstupac on Nov 7 2016, 12:50 PM.

Details

Summary

The patches fix (base, index, offset) match in DAGCombine to combine the following stores/loads (a, b, c char arrays):
for (i = 0; i < n; i += 2) {

char c1 = c{*b];
char c2 = c[*b + 1];
a[i] = c1;
a[i + 1] = c2;
b++;

}
into (a, c, the same short arrays):
for (i = 0; i < n; i++) {

short c1 = c{*b];
a[i] = c1;
b++;

}

Without the patch match returned the following (base, index, offset):
*(a + i) -> (a, i, 0)
*(a + i + 1) -> (a + i, undef, 1)
and loads/stores were not combined because of different base.

Diff Detail

Repository
rL LLVM

Event Timeline

evstupac updated this revision to Diff 77081.Nov 7 2016, 12:50 PM
evstupac retitled this revision from to Fix DAGCombiner match.
evstupac updated this object.
evstupac set the repository for this revision to rL LLVM.
evstupac added a subscriber: llvm-commits.
evstupac added a subscriber: Farhana.
RKSimon edited edge metadata.Nov 15 2016, 6:15 AM

This LGTM but I think we should probably add tests for other targets as well.

This LGTM but I think we should probably add tests for other targets as well.

There are similar tests on ARM only. When I tried to extend them I've got the following:

ldrb r12, [lr, r12]!
ldrb lr, [lr, #1]

Instead of 1 ldrh.

I'm not sure what is more profitable for ARM and which -mtriple/-mcpu is better. So I'd better leave the test inserting for ARM guys.

This LGTM but I think we should probably add tests for other targets as well.

There are similar tests on ARM only. When I tried to extend them I've got the following:

ldrb r12, [lr, r12]!
ldrb lr, [lr, #1]

Instead of 1 ldrh.

I'm not sure what is more profitable for ARM and which -mtriple/-mcpu is better. So I'd better leave the test inserting for ARM guys.

Renato/James - is there any insight you can offer here please?

The patch looks not that complicated. One of my next patches depends on it. Let's commit it and Arm guys will add test if necessary.

The patch looks not that complicated. One of my next patches depends on it. Let's commit it and Arm guys will add test if necessary.

I'd agree with you but the lack of existing test coverage makes me very nervous.

jmolloy edited edge metadata.Dec 2 2016, 6:25 AM

Hi,

This LGTM but I think we should probably add tests for other targets as well.

There are similar tests on ARM only. When I tried to extend them I've got the following:

ldrb r12, [lr, r12]!
ldrb lr, [lr, #1]

Instead of 1 ldrh.

I'm not sure what is more profitable for ARM and which -mtriple/-mcpu is better. So I'd better leave the test inserting for ARM guys.

This is not a good result. It seems fairly obvious to me that two instructions are worse than one - an LDRH would be what I would expect generated.

James

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11261

Why has IsIndexSignExt been removed here?

Given that this affects all targets, have you done performance testing using the test-suite? What were the results?

Cheers,

James

Given that this affects all targets, have you done performance testing using the test-suite? What were the results?

I'm not sure what is "the test-suite".
However I've tested this on specs and some other benchmarks. The are no significant performance changes (only noise). Some structure loads were optimized but not in hot loops.

This is not a good result. It seems fairly obvious to me that two instructions are worse than one - an LDRH would be what I would expect generated.

I'm not Arm specialist.
The only similar test I've found is:
test/CodeGen/ARM/MergeConsecutiveStores.ll
But it it has an option -mtriple=armv7-apple-darwin. For the option I get the result I've posted. I don't want to modify it.
For -mtriple=aarch64 I get expected result:

ldrh w9, [x2, x9]

But for Aarch64 there no other tests on store/load merge.
I can restructure all this how I feel is better.
But again I'm not Arm specialist. And (since there are no regressions) it is better leave all this for someone who has better knowledge in Arm.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11261

It was not removed. We'll add it when form return value BaseIndexOffset(..,..,..).
Here we just do not stop when found VAR + OFFSET, but recursively call match for the VAR. It could happen that VAR is not just BASE, but BASE + INDEX.

http://llvm.org/svn/llvm-project/test-suite/trunk

For this I can get only X86 results. Is this ok?

Hi Evgeny,

There are no new fails on LLVM LIT tests.

The LLVM LIT tests are essentially unit tests so provide a quick indication that something is broken. The test-suite is a series of actual programs and benchmarks that provide a more rigorous correctness and performance test.

The test-suite can be run via LNT - see here for example: http://llvm.org/docs/lnt/quickstart.html#running-tests . For this change, I'm worried about the potential impact of the change beyond the regression test you've written, so I'd prefer if you could run performance testing and check there are no major performance regressions. (Make sure you use -j1 when invoking LNT for this, otherwise you'll end up with noisy results! also use --multisample to get an idea of the variance as some tests can vary wildly).

For this I can get only X86 results. Is this ok?

That's fine - X86 is the only hardware that we can reasonably expect everyone to have access to.

+Chad, to run an eye over the load/store merging changes.

That's fine - X86 is the only hardware that we can reasonably expect everyone to have access to.

For X86 I've got build same for all but MiBench/consumer-lame, which has no performance difference.

rengolin edited edge metadata.Dec 12 2016, 2:07 AM
ldrb r12, [lr, r12]!
ldrb lr, [lr, #1]

Instead of 1 ldrh.

The question here is: Is LLVM emitting one LDRH today, or is it already emitting two LDRBs?

If the former, than this is a regression. If the latter, than you can safely ignore ARM.

The AArch64 side looks good, but I'd also like to know if we were emitting two byte-loads before.

However, even on the ARM side, this looks more like and issue on the ARM code-gen than on the DAG combiner, so it should be fine to push this through and fix the ARM code-gen later.

But we need to know what's the behaviour today, so that we can correctly fill the bug we open on ARM, to mention if it is a regression or not.

cheers,
--renato

Today the behavior is the same 2 byte loads. So there is no regressions. I'm just unable to add new test for ARM.

Today the behavior is the same 2 byte loads. So there is no regressions. I'm just unable to add new test for ARM.

Right, I have just confirmed this behaviour. Also, your patch improves AArch64 (as you said), so I think the best course of action now is to:

  • add the same snippet above to the ARM test, checking for two ldrbs and with a big FIXME saying that we need to fix PartialOffset in the DAGCombiner to make it work.
  • Copy that test to the AArch64 directory, change the triple to "aarch64-linux-gnu" and change the register patterns from r[0-9] to w[0-9].
  • Change the last (added) test in AArch64 to expect ldrh and see it pass.

I can help you with the tests, but the error messages should be reasonably simple to match.

James, given that this is not changing the existing behaviour on ARM and is improving x86 and AArch64 code (and didn't show variation in spec), it should be ok to merge.

cheers,
--renato

add the same snippet above to the ARM test, checking for two ldrbs and with a big FIXME saying that we need to fix PartialOffset in the DAGCombiner to make it work.

Hmm... It looks like bug report is better.

Copy that test to the AArch64 directory, change the triple to "aarch64-linux-gnu" and change the register patterns from r[0-9] to w[0-9].

I'd leave this move for separate commit from James, you or someone else from ARM. It should be easier for you and this move is not related to my fix.

Change the last (added) test in AArch64 to expect ldrh and see it pass.

If you move the test before I commit the patch I'll add new check there. If after you'll need to add it.

Thanks,
Evgeny

Today the behavior is the same 2 byte loads. So there is no regressions. I'm just unable to add new test for ARM.

Right, I have just confirmed this behaviour. Also, your patch improves AArch64 (as you said), so I think the best course of action now is to:

  • add the same snippet above to the ARM test, checking for two ldrbs and with a big FIXME saying that we need to fix PartialOffset in the DAGCombiner to make it work.
  • Copy that test to the AArch64 directory, change the triple to "aarch64-linux-gnu" and change the register patterns from r[0-9] to w[0-9].
  • Change the last (added) test in AArch64 to expect ldrh and see it pass.

I can help you with the tests, but the error messages should be reasonably simple to match.

James, given that this is not changing the existing behaviour on ARM and is improving x86 and AArch64 code (and didn't show variation in spec), it should be ok to merge.

cheers,
--renato

Ok, LGTM from my side.

rengolin accepted this revision.Dec 15 2016, 1:34 AM
rengolin edited edge metadata.
This revision is now accepted and ready to land.Dec 15 2016, 1:34 AM