Page MenuHomePhabricator

[AArch64] Fold B = csel A, A into B = COPY A
Needs ReviewPublic

Authored by jroelofs on Apr 30 2018, 11:28 AM.

Details

Summary

I found the following pattern in the LLVM test suite + SPEC 93 times:

csel B, A, A

I see that in some places after MachineCSE runs, we end up with things like:

%5:gpr32 = CSELWr %1, %1, 12, implicit $nzcv

In this case, a COPY would be much better.

I added support for optimizeSelect in AArch64InstrInfo, which is called by the PeepholeOptimizer.

Diff Detail

Unit TestsFailed

TimeTest
70 msx64 debian > LLVM.Examples/OrcV2Examples::lljit-with-remote-debugging.test
Script: -- : 'RUN: at line 4'; /mnt/disks/ssd0/agent/llvm-project/build/bin/LLJITWithRemoteDebugging /mnt/disks/ssd0/agent/llvm-project/llvm/test/Examples/OrcV2Examples/Inputs/argc_sub1_elf.ll | /mnt/disks/ssd0/agent/llvm-project/build/bin/FileCheck --check-prefix=CHECK1 /mnt/disks/ssd0/agent/llvm-project/llvm/test/Examples/OrcV2Examples/lljit-with-remote-debugging.test
780 msx64 windows > lld.MachO::reproduce.s
Script: -- : 'RUN: at line 3'; rm -rf C:\ws\w32-1\llvm-project\premerge-checks\build\tools\lld\test\MachO\Output\reproduce.s.tmp.dir

Event Timeline

thegameg created this revision.Apr 30 2018, 11:28 AM

Do you have any idea why this is happening? I would expect that normally, instruction selection won't insert a useless csel. (If we're missing some fold before isel, fixing that could have a larger benefit overall.)

Do you have any idea why this is happening? I would expect that normally, instruction selection won't insert a useless csel. (If we're missing some fold before isel, fixing that could have a larger benefit overall.)

From what I've seen after ISel there are no useless csel, but they are added by the EarlyIfConverter and after some coalescing and other optimizations like CSE, we end up with some useless ones. I didn't look at the EarlyIfConverter as I thought this would catch more cases.

So EarlyIfConverter+MachineCSE is sort of acting like a primitive form of gvn-hoist. That makes sense. An actual hoisting pass would probably be more effective, though.

This does look like a heavy hammer for a small fix. From the optimisation guides, CSEL and MOV have the same latency/bandwidth and the condition is probably pipelined in anyway.

It's not really csel vs. mov; the COPY likely gets coalesced away, and it might allow erasing the condition which feeds the select, which might allow erasing more code, etc.

It's not really csel vs. mov; the COPY likely gets coalesced away, and it might allow erasing the condition which feeds the select, which might allow erasing more code, etc.

Ah, good point! Indeed, much better than I originally thought. :)

thegameg updated this revision to Diff 167927.Oct 2 2018, 5:37 AM
thegameg added a reviewer: MatzeB.
  • Fixed bugs where physregs were involved and were treated as having an unique def.
  • Added more tests.
  • Try harder to check the registers by looking through all the vregs until a physreg is found. Then compare definitions or check if it's part of the liveins list.
aemerson resigned from this revision.Jan 28 2019, 11:54 AM
jroelofs commandeered this revision.Apr 26 2021, 1:54 PM
jroelofs added a reviewer: thegameg.
jroelofs updated this revision to Diff 340636.Apr 26 2021, 1:55 PM

rebased & fixed a few issues that showed up in the tests

Herald added a project: Restricted Project. · View Herald TranscriptApr 26 2021, 1:55 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
dmgreen added inline comments.
llvm/test/CodeGen/AArch64/srem-seteq.ll
272

I think it's worth fixing this in ISel. The whole thing should be mov x0, 1, which the further simplification during DAG combine will simplify to so long as it doesn't get caught up on the CSEL.

I presume the same sort of thing could easily be done in GlobalISel.

Are there other cases motivating this patch?

jroelofs added inline comments.Apr 28 2021, 10:16 AM
llvm/test/CodeGen/AArch64/srem-seteq.ll
272

Yes, I have seen a few of the trivial matching-operands-fcsel case in the test-suite. Let me try to reduce one of them.

jroelofs added inline comments.Apr 28 2021, 11:25 AM
llvm/test/CodeGen/AArch64/srem-seteq.ll
272

Since the source of these seems to be early-ifcvt, maybe it's better to solve this problem there? https://reviews.llvm.org/D101508

jroelofs added inline comments.Apr 29 2021, 8:37 AM
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
920

I found a case that throws a wrench in this strategy:

early-clobber %66:gpr64common, %67:gpr64 = LDRXpre %38:gpr64common(tied-def 0), 16
%74:gpr64 = CSELXr killed %67:gpr64, %66:gpr64common, 11, implicit $nzcv

Yes, I have seen a few of the trivial matching-operands-fcsel case in the test-suite. Let me try to reduce one of them.

Oh yeah. IfCvt and CSE working together I see. It always feels odd for llvm to very carefully not hoist a fdiv out of an block, just for the backend to do it anyway :)

Since the source of these seems to be early-ifcvt, maybe it's better to solve this problem there?

Sounds good to me.

I may put up a patch for the case that can be fixed in ISel too, because it should be super simple (and there's already a test case!)

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
920

Does that apply to D101508 too?

jroelofs added inline comments.Apr 29 2021, 10:30 AM
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
920

It does, yeah. I'm adjusting that patch as we speak :)