This is an archive of the discontinued LLVM Phabricator instance.

[DAG] select Cond, -1, C --> or (sext Cond), C if Cond is MVT::i1
ClosedPublic

Authored by deadalnix on Aug 5 2022, 7:32 AM.

Details

Summary

This seems to be beneficial overall, except for midpoint-int.ll .

The X86 backend seems to generate zeroing that are not necesary.

Diff Detail

Event Timeline

deadalnix created this revision.Aug 5 2022, 7:32 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 5 2022, 7:32 AM
deadalnix requested review of this revision.Aug 5 2022, 7:32 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 5 2022, 7:32 AM
deadalnix added inline comments.Aug 5 2022, 7:39 AM
llvm/test/CodeGen/X86/memcmp-more-load-pairs-x32.ll
163–166

This seems to be unnecessary.

pengfei added inline comments.Aug 5 2022, 8:02 AM
llvm/test/CodeGen/X86/memcmp-more-load-pairs-x32.ll
163–166
deadalnix added inline comments.Aug 5 2022, 8:48 AM
llvm/test/CodeGen/X86/memcmp-more-load-pairs-x32.ll
163–166

Good catch, thanks.

deadalnix updated this revision to Diff 450454.Aug 5 2022, 5:52 PM

Reabse and variosu tweaks.

deadalnix retitled this revision from [DAG] select Cond, C, -1 --> or (sext Cond), C if Cond is MVT::i1 to [DAG] select Cond, -1, C --> or (sext Cond), C if Cond is MVT::i1.Aug 5 2022, 6:51 PM
deadalnix added inline comments.Aug 6 2022, 6:52 PM
llvm/test/CodeGen/X86/select_const.ll
290

TODO: (sext Cond) | (Pow2 - 1) -> Pow2 - (zext (not Cond)) ?

Maybe if inverting Cond if for free? Is there a way to check for this?

Or maybe this is best reserved for X86DAGToDAGISel::matchAddressRecursively?

deadalnix added inline comments.Aug 6 2022, 7:05 PM
llvm/test/CodeGen/X86/select_const.ll
290

I meant (sext Cond) | (Pow2 - 1) -> (zext (not Cond)) * Pow2 - 1 ?

deadalnix added inline comments.Aug 7 2022, 7:37 AM
llvm/test/CodeGen/X86/select_const.ll
290

Doing it in selectLEAAddr doesn't work as the score doesn't get high enough.

deadalnix added inline comments.Aug 7 2022, 6:26 PM
llvm/test/CodeGen/X86/memcmp.ll
371–372

In both cases here, cmpl %edx, %ecx seems to be recomputed for no reason (?) It seems to be able to reuse cmpl %ecx, %eax in length5 and cmpw %dx, %cx in length3, at least before this diff.

Is there a reason why this isn't the case here? I'm not quite sure how memcmp is being lowered, and -debug really isn't useful here. @gchatelet , you were looking for ways to improve memcmp, I think there is a low hanging fruit somewhere in there.

pengfei added inline comments.Aug 7 2022, 6:43 PM
llvm/test/CodeGen/X86/memcmp.ll
371–372

xor will change EFLAGS, so we need to recomputed to get the correct EFLAGS.

deadalnix added inline comments.Aug 7 2022, 7:05 PM
llvm/test/CodeGen/X86/memcmp.ll
371–372

Sure, but the xor itself doesn't seem necessary. For instance, length5 does:

setae %al
movzbl %al, %eax

Which does away witth he need for the xor, with in turn does away with the need to recompute the cmpl. Both function do the same thing in that block, there is no reason to have different codegen.

please can you rebase this against trunk latest?

RKSimon added inline comments.Aug 10 2022, 12:13 PM
llvm/test/CodeGen/X86/select_const.ll
204

Comment doesn't match select_lea_2 (but the others are still OK).

RKSimon added inline comments.Aug 11 2022, 2:56 AM
llvm/test/CodeGen/PowerPC/ppc64-P9-setb.ll
1212 ↗(On Diff #451591)

regression?

deadalnix added inline comments.Aug 15 2022, 6:00 AM
llvm/test/CodeGen/PowerPC/ppc64-P9-setb.ll
1212 ↗(On Diff #451591)

I have no idea, but think so.

deadalnix updated this revision to Diff 455002.Aug 23 2022, 4:07 PM

I restored the select_cc creation, but now it generates some select_cc that the powerpc backend doesn't quite know what to do with.

Adding some PPC people who might be able to advise on the remaining regressions

Dear PowerPC folks, your help would be greatly appreciated here.

qiucf added a reviewer: Restricted Project.Sep 4 2022, 7:52 PM
shchenz added a subscriber: shchenz.Sep 7 2022, 1:43 AM
shchenz added inline comments.
llvm/test/CodeGen/PowerPC/signbit-shift.ll
142

The changes in this file seem regressions, especially for this case, more instructions and more registers. Do we know why?

deadalnix added inline comments.Sep 7 2022, 1:42 PM
llvm/test/CodeGen/PowerPC/signbit-shift.ll
142

Yes. The PowerPC backend request for select_cc instruction. The select_cc instruction is not turned into arythmetic the way regular select instructions are.

This patch changes how select -> arithmetic transformations are done, which enable more transform in general, but specifically on PowerPC, this cases problems because it causes either select_cc to not be generated due select -> arithmetic transformations, or, alternatively, if the promotion to select_cc is prioritized, then the cases where the arithmetic transform was beneficial are not done.

Do we have a good idea when select_cc -> arithmetic is beneficial? If we have something reasonable here, we can add the select_cc transforms and get these regression dealt with.

deadalnix updated this revision to Diff 458558.Sep 7 2022, 2:20 PM

What happens if select -> arithmetic takes priority

deadalnix updated this revision to Diff 458561.Sep 7 2022, 2:25 PM

When select_cc takes precedence

deadalnix added inline comments.Sep 9 2022, 6:54 AM
llvm/test/CodeGen/PowerPC/signbit-shift.ll
142

If you want to see the difference when one takes priority over the other: https://reviews.llvm.org/D131260?vs=458558&id=458561

shchenz added inline comments.Sep 9 2022, 7:43 AM
llvm/test/CodeGen/PowerPC/signbit-shift.ll
142

Hmm, PowerPC sets SELECT_CC as custom for type i32/i64, but it does not mean the customization is better than the arithmetic way here. In fact it is worse as shown in the case. If on other targets, the customization for SELECT_CC is better, then we may need to adjust the current target hook foldSelectOfConstants()(for example, if we know the SELECT can be optimized to SELECT_CC on some target and SELECT_CC is better, we return false?) or we may need to re-implement the SELECT combining logic here on PowerPC for SELECT_CC. IMO, the first one makes more sense. Thoughts?

RKSimon added inline comments.Sep 9 2022, 8:03 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
10260

Maybe adjust convertSelectOfConstantsToMath to take the Cond opcode as another argument instead?

@RKSimon I'll try what you suggest tomorrow. I've been sick so things are moving slowly on my end. Apologies.

deadalnix updated this revision to Diff 462864.Sep 26 2022, 4:33 AM

Update the condition in which we transform select to math as to preserve opportunities to optimize select_cc

RKSimon added inline comments.Sep 26 2022, 5:15 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
23808

unrelated - pre-commit this?

deadalnix updated this revision to Diff 462891.Sep 26 2022, 6:28 AM

Fix the latest broken test

deadalnix updated this revision to Diff 462897.Sep 26 2022, 6:58 AM

Precommit trivial changes and rebase

LGTM @shchenz any more comments?

shchenz accepted this revision as: shchenz.Sep 27 2022, 4:40 AM

LGTM @shchenz any more comments?

PPC part looks good!

This revision is now accepted and ready to land.Sep 27 2022, 4:40 AM
This revision was landed with ongoing or failed builds.Sep 27 2022, 5:55 AM
This revision was automatically updated to reflect the committed changes.