This patch generates tbz/tbnz when comparing against zero. The tbz/tbnz checks the sign bit to convert
op w1, w1, w10
cmp w1, #0
b.lt .LBB0_0
to
op w1, w1, w10
tbnz w1, #31, .LBB0_0
Please have a look.
Chad
Differential D4440
[AArch64] Generate tbz/tbnz when comparing against zero. mcrosier on Jul 9 2014, 1:07 PM. Authored by
Details This patch generates tbz/tbnz when comparing against zero. The tbz/tbnz checks the sign bit to convert op w1, w1, w10 to op w1, w1, w10 Please have a look. Chad
Diff Detail
Event TimelineComment Actions The patch looks good, and I am researching an issue which is a bit like this one. I think there should be more test case, for example cmp w0, #1 // =1 b.lt .LBB Comment Actions I have tried your patch and test.Here, I doubt whether it has something to do with add/sub or adds/subs, and I have also tried some test case written by myself. void foo(); void gt(int tmp) { if (tmp >= 0) foo(); } and the asm is below: ge: // @ge // BB#0: // %entry cmp w0, #0 // =0 b.lt .LBB1_2 // BB#1: // %if.then b foo .LBB1_2: // %if.end ret .Ltmp3: .size ge, .Ltmp3-ge I think the cmp and b.lt above can combine to tbz/tbnz. I think your patch should cover this case. When I chang the c code to (tmp>0), it generates gt: // @gt // BB#0: // %entry cmp w0, #1 // =1 b.lt .LBB0_2 // BB#1: // %if.then b foo .LBB0_2: // %if.end ret I think this case may not be easy to generate TBZ/TBNZ, what is your opinion? Thanks Comment Actions The difference between add/sub and adds/subs is that the latter sets the NZCV bits.
AFAICT, that case can be handled by a similar combine, but it shouldn't block this patch. Feel free to submit a patch of your own.
I don't think that this is possible as you're not strictly checking in sign bit. You would need two checks, to see if the value is zero and another to check if it's negative.
Chad Comment Actions Hi Chad, 2014-07-10 4:08 GMT+08:00 Chad Rosier <mcrosier@codeaurora.org>:
I think there are two cases around this,
ANDS, ORRS and others etc.
present performBRCONDCombine can cover this scenario. The patch would be more complete if you can add more instructions for case For case 2, it can be a separate patch, I think. Thanks,
Comment Actions Hi Jiangning, According to the documentation I have, the only other scalar operations that set the condition flags are ADCS, BICS, NEGS, NGCS, and SCBS. I'll look into adding these operations as well, but I'm not sure we'll hit these case very often, if at all. I'll investigate adding an implementation in InstCombine. Thanks for the suggestion! Chad Comment Actions Jiangning, The performance numbers are slightly better, but still less than noise. All, please take a look. Chad Comment Actions To be clear, the numbers are slightly better relative to the ADD/SUB only combine. Overall, we still see a large improvement in eembc/OAv2. Comment Actions Hi Chad, I'm happy with this, and your new patch looks good to me if only you can Thanks, 2014-07-30 0:02 GMT+08:00 Chad Rosier <mcrosier@codeaurora.org>:
Comment Actions Hi Chad, LGTM now! Thanks, 2014-07-31 23:23 GMT+08:00 Chad Rosier <mcrosier@codeaurora.org>:
|