This is an archive of the discontinued LLVM Phabricator instance.

AArch64: Implement missed conditional compare sequences.
ClosedPublic

Authored by bmakam on Jan 18 2016, 5:14 AM.

Details

Summary

This is an extension to the existing implementation of r242436 which
restricts to only select inputs. This version fixes missed opportunities
in pr26084 by attempting to lower conditional compare sequences of
and/or trees with setcc leafs. This will additionaly handle the case
when a tree with select input is not a conjunction-disjunction tree
but some of the sub trees are conjunction-disjunction trees.

Diff Detail

Repository
rL LLVM

Event Timeline

bmakam updated this revision to Diff 45171.Jan 18 2016, 5:14 AM
bmakam retitled this revision from to AArch64: Implement missed conditional compare sequences..
bmakam updated this object.
bmakam added a reviewer: mcrosier.
mcrosier edited edge metadata.
mcrosier added a subscriber: llvm-commits.

I'd like to hear @MatzeB's comments on the patch. We should be able to handle things other than returns, but we might save that for the next step.

test/CodeGen/AArch64/arm64-ccmp.ll
404 ↗(On Diff #45171)

Please add a test case for 'and' as well.

Perhaps a better place to catch this would be in performExtendCombine? You could look for (zext/sext/anyext (and/or i1)) there and transform it into a CSEL with optimized condition.
Here are a couple more test cases that I think this approach would catch:

int single_noselect_phi(int A, int B, int C)
{

_Bool b;
if (C) {
    b = A < 4 || B < 2;
} else {
    b = A > 0 && B > 0;
}
return b;

}

int single_ext(int A, int B, int C)
{

return (A < 4 || B < 2) + C;

}

MatzeB edited edge metadata.Jan 19 2016, 6:07 PM

The transformation done in LowerAND/LowerOR in this patch should always be correct I think, however it will obviously pessimize code when the result is fed back into a condition input of a node like BRCOND, CSEL, ... I think that instead of restricting this to Return inputs, this can indeed be used in more cases.
I think as LegalizeDAG works in reverse topological order, the only way we would see And i32, Or i32 (with setcc leafs) is when we previously had a zext i1->i32 or if we failed to construct a ccmp sequence because isConjunctionDisjunctionTree() was not satisfied. So I think it should be enough to test for isConjunctionDisjunctionTree() and do the transform is successful. That should also catch the cases gberry mentioned above I think (and it would IMO be cleaner than introducing CSELs earlier in the pipeline, esp. since we still have to know whether we can form a ccmp sequence at all or the transformation will be a pessimisation).

Thanks for the feedback Matthias and Geoff,

The approach here is to catch the cases where LowerSELECT failed to construct ccmp sequences, so had to be done at LowerOR/LowerAND. The current patch fails to catch the cases mentioned by Geoff because of the restriction to Return inputs. We should be able to handle other than returns and I was saving this for a follow on patch because it breaks the select_noccmp1 test case in arm64-ccmp.ll:

define i64 @select_noccmp1(i64 %v1, i64 %v2, i64 %v3, i64 %r) {
  %c0 = icmp slt i64 %v1, 0
  %c1 = icmp sgt i64 %v1, 13
  %c2 = icmp slt i64 %v3, 2
  %c4 = icmp sgt i64 %v3, 4
  %and0 = and i1 %c0, %c1
  %and1 = and i1 %c2, %c4
  %or = or i1 %and0, %and1
  %sel = select i1 %or, i64 0, i64 %r
  ret i64 %sel
}

Matthias, can you please let know why you added this test case? I think we fail to construct ccmp sequence here because the tree at or i1 is not a conjunctiondisjunction tree as we cannot push negate operation through the tree not (or (and x y) (and x z)). However, with the current approach of doing the transformation at LowerOR/LowerAND we can look at the sub-trees (and x y), (and x z) respectively and they can be transformed into ccmp sequences thus breaking this testcase. Is it reasonable to modify the test case if we agree on extending this to catch the cases Geoff mentioned?

bmakam updated this revision to Diff 45514.Jan 21 2016, 5:03 AM
bmakam updated this object.
bmakam edited edge metadata.

Addressed reviewers comments/suggestions and removed the restriction to Return inputs.

bmakam marked an inline comment as done.Jan 21 2016, 5:04 AM
mcrosier added inline comments.Jan 21 2016, 6:48 AM
lib/Target/AArch64/AArch64ISelLowering.cpp
1566 ↗(On Diff #45514)

No need for the extra parens around the call is isConjunctionDisjunctionTree.

1568 ↗(On Diff #45514)

Same; No need for the extra parens around the call is isConjunctionDisjunctionTree.

1570 ↗(On Diff #45514)

If possible, I would hoist this check above the calls to isConjunctionDisjunctionTree.

1702 ↗(On Diff #45514)

Please remove the extra whitespace.

1710 ↗(On Diff #45514)

Please remove the extra whitespace.

bmakam updated this revision to Diff 45543.Jan 21 2016, 9:27 AM
bmakam updated this revision to Diff 45545.Jan 21 2016, 9:30 AM
bmakam marked 5 inline comments as done.

Addressed comments from Chad.

The code looks good to me. Now we need to collect performance results and investigate regressions, if any are exposed.

No correctness failures observed. Pending perf data. Just want to give a heads up that this new version is more invasive and impacts almost all of Spec2k/2k6.

This looks like the right direction now, however I believe that you can shorten tryLowerToAArch64Cmp() as noted below. I'd also recommend rebaseing llvm to get r258605 (I found I bug in my ccmp creation code while re-reading it during review).

lib/Target/AArch64/AArch64ISelLowering.cpp
1557–1571 ↗(On Diff #45545)

I would assume that you can replace the first part of this function with:

bool Dummy;
if (!isConjunctionDisjunctionTree(Op, Dummy))
   return SDValue();

That would be shorter and give let the function bail out on some subtrees that cannot be represented with ccmp sequences (not a correctness problem but you would generate unnecessary csels below if you cannot use ccmp anyway).

bmakam updated this revision to Diff 45890.Jan 25 2016, 11:45 AM

Rebased and addressed Mathias comments. Still pending full perf data, so far everything tested is within noise. There is net reduction in number of static instructions which is a good thing.

bmakam marked an inline comment as done.Jan 25 2016, 11:46 AM
MatzeB added inline comments.Jan 25 2016, 12:46 PM
lib/Target/AArch64/AArch64ISelLowering.cpp
1614–1615 ↗(On Diff #45890)

I think the same check is performed inside isConjunctionDisjunctionTree.

bmakam updated this revision to Diff 46546.Feb 1 2016, 9:09 AM

Sorry for the late response. It took me some time to get the full performance data. I did not find any significant gains or regressions with this patch and there were net reductions in the number of static and dynamic instructions. Overall the data seems to be good.

Benchmark Diff


spec2006/astar -0.0314288
spec2006/bzip2 0.00814573
spec2006/dealII 0.0193043
spec2006/gcc 0.130547
spec2006/gobmk 0.155664
spec2006/h264ref -0.0788196
spec2006/hmmer -0.126024
spec2006/lbm -0.244571
spec2006/libquantum 0.197392
spec2006/mcf -0.65466
spec2006/milc -0.170548
spec2006/namd -0.0334161
spec2006/omnetpp 2.86068
spec2006/perlbench -0.598455
spec2006/povray 1.73534
spec2006/sjeng 0.184844
spec2006/soplex 0.537032
spec2006/sphinx3 1.83868
spec2006/xalancbmk -0.774806

CINT2006_GEOMEAN 0.11%
CFP2006_GEOMEAN 0.53%
CPU2006_GEOMEAN 0.27%

Only significant regressions were in xalancbmk(-0.77%) and perlbench(-0.59%) but they seemed transitory as a rerun with the latest tip shows the data is noisy,
spec2006/xalancbmk 0.232571
spec2006/perlbench 0.164644

bmakam marked an inline comment as done.Feb 1 2016, 9:10 AM
MatzeB accepted this revision.Feb 1 2016, 10:36 AM
MatzeB edited edge metadata.

Thanks for working on this, LGTM

This revision is now accepted and ready to land.Feb 1 2016, 10:36 AM
This revision was automatically updated to reflect the committed changes.
bmakam added a subscriber: bmakam.Feb 4 2016, 9:00 AM

Hi Marcello,

Without making it LegalOrCustom it will break these lit-tests:
CodeGen/AArch64/arm64-abi-varargs.ll
CodeGen/AArch64/arm64-aapcs.ll

Would it help if we add a check to let legalize expand this if it isn't legal type yet during cutom lowering?

Adding something like in tryLowerToAArch64Cmp:

if (!DAG.getTargetLoweringInfo().isTypeLegal(VT))
  return SDValue();

Thanks,
Balaram