This is an extension to the existing implementation of r242436 which
restricts to only select inputs. This version fixes missed opportunities
in pr26084 by attempting to lower conditional compare sequences of
and/or trees with setcc leafs. This will additionaly handle the case
when a tree with select input is not a conjunction-disjunction tree
but some of the sub trees are conjunction-disjunction trees.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Perhaps a better place to catch this would be in performExtendCombine? You could look for (zext/sext/anyext (and/or i1)) there and transform it into a CSEL with optimized condition.
Here are a couple more test cases that I think this approach would catch:
int single_noselect_phi(int A, int B, int C)
{
_Bool b; if (C) { b = A < 4 || B < 2; } else { b = A > 0 && B > 0; } return b;
}
int single_ext(int A, int B, int C)
{
return (A < 4 || B < 2) + C;
}
The transformation done in LowerAND/LowerOR in this patch should always be correct I think, however it will obviously pessimize code when the result is fed back into a condition input of a node like BRCOND, CSEL, ... I think that instead of restricting this to Return inputs, this can indeed be used in more cases.
I think as LegalizeDAG works in reverse topological order, the only way we would see And i32, Or i32 (with setcc leafs) is when we previously had a zext i1->i32 or if we failed to construct a ccmp sequence because isConjunctionDisjunctionTree() was not satisfied. So I think it should be enough to test for isConjunctionDisjunctionTree() and do the transform is successful. That should also catch the cases gberry mentioned above I think (and it would IMO be cleaner than introducing CSELs earlier in the pipeline, esp. since we still have to know whether we can form a ccmp sequence at all or the transformation will be a pessimisation).
Thanks for the feedback Matthias and Geoff,
The approach here is to catch the cases where LowerSELECT failed to construct ccmp sequences, so had to be done at LowerOR/LowerAND. The current patch fails to catch the cases mentioned by Geoff because of the restriction to Return inputs. We should be able to handle other than returns and I was saving this for a follow on patch because it breaks the select_noccmp1 test case in arm64-ccmp.ll:
define i64 @select_noccmp1(i64 %v1, i64 %v2, i64 %v3, i64 %r) { %c0 = icmp slt i64 %v1, 0 %c1 = icmp sgt i64 %v1, 13 %c2 = icmp slt i64 %v3, 2 %c4 = icmp sgt i64 %v3, 4 %and0 = and i1 %c0, %c1 %and1 = and i1 %c2, %c4 %or = or i1 %and0, %and1 %sel = select i1 %or, i64 0, i64 %r ret i64 %sel }
Matthias, can you please let know why you added this test case? I think we fail to construct ccmp sequence here because the tree at or i1 is not a conjunctiondisjunction tree as we cannot push negate operation through the tree not (or (and x y) (and x z)). However, with the current approach of doing the transformation at LowerOR/LowerAND we can look at the sub-trees (and x y), (and x z) respectively and they can be transformed into ccmp sequences thus breaking this testcase. Is it reasonable to modify the test case if we agree on extending this to catch the cases Geoff mentioned?
Addressed reviewers comments/suggestions and removed the restriction to Return inputs.
lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
1566 ↗ | (On Diff #45514) | No need for the extra parens around the call is isConjunctionDisjunctionTree. |
1568 ↗ | (On Diff #45514) | Same; No need for the extra parens around the call is isConjunctionDisjunctionTree. |
1570 ↗ | (On Diff #45514) | If possible, I would hoist this check above the calls to isConjunctionDisjunctionTree. |
1702 ↗ | (On Diff #45514) | Please remove the extra whitespace. |
1710 ↗ | (On Diff #45514) | Please remove the extra whitespace. |
The code looks good to me. Now we need to collect performance results and investigate regressions, if any are exposed.
No correctness failures observed. Pending perf data. Just want to give a heads up that this new version is more invasive and impacts almost all of Spec2k/2k6.
This looks like the right direction now, however I believe that you can shorten tryLowerToAArch64Cmp() as noted below. I'd also recommend rebaseing llvm to get r258605 (I found I bug in my ccmp creation code while re-reading it during review).
lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
1557–1571 ↗ | (On Diff #45545) | I would assume that you can replace the first part of this function with: bool Dummy; if (!isConjunctionDisjunctionTree(Op, Dummy)) return SDValue(); That would be shorter and give let the function bail out on some subtrees that cannot be represented with ccmp sequences (not a correctness problem but you would generate unnecessary csels below if you cannot use ccmp anyway). |
Rebased and addressed Mathias comments. Still pending full perf data, so far everything tested is within noise. There is net reduction in number of static instructions which is a good thing.
lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
1614–1615 ↗ | (On Diff #45890) | I think the same check is performed inside isConjunctionDisjunctionTree. |
Sorry for the late response. It took me some time to get the full performance data. I did not find any significant gains or regressions with this patch and there were net reductions in the number of static and dynamic instructions. Overall the data seems to be good.
Benchmark Diff
spec2006/astar -0.0314288
spec2006/bzip2 0.00814573
spec2006/dealII 0.0193043
spec2006/gcc 0.130547
spec2006/gobmk 0.155664
spec2006/h264ref -0.0788196
spec2006/hmmer -0.126024
spec2006/lbm -0.244571
spec2006/libquantum 0.197392
spec2006/mcf -0.65466
spec2006/milc -0.170548
spec2006/namd -0.0334161
spec2006/omnetpp 2.86068
spec2006/perlbench -0.598455
spec2006/povray 1.73534
spec2006/sjeng 0.184844
spec2006/soplex 0.537032
spec2006/sphinx3 1.83868
spec2006/xalancbmk -0.774806
CINT2006_GEOMEAN 0.11%
CFP2006_GEOMEAN 0.53%
CPU2006_GEOMEAN 0.27%
Only significant regressions were in xalancbmk(-0.77%) and perlbench(-0.59%) but they seemed transitory as a rerun with the latest tip shows the data is noisy,
spec2006/xalancbmk 0.232571
spec2006/perlbench 0.164644
Hi Marcello,
Without making it LegalOrCustom it will break these lit-tests:
CodeGen/AArch64/arm64-abi-varargs.ll
CodeGen/AArch64/arm64-aapcs.ll
Would it help if we add a check to let legalize expand this if it isn't legal type yet during cutom lowering?
Adding something like in tryLowerToAArch64Cmp:
if (!DAG.getTargetLoweringInfo().isTypeLegal(VT)) return SDValue();
Thanks,
Balaram