This is an archive of the discontinued LLVM Phabricator instance.

[SDAG][RISCV] Avoid neg instructions when lowering atomic_load_sub with a constant rhs
ClosedPublic

Authored by dtcxzyw on Aug 23 2023, 2:05 PM.

Details

Summary

This patch avoids creating (sub x0, rhs) when lowering atomic_load_sub with a constant rhs.
Comparison with GCC: https://godbolt.org/z/c5zPdP7j4

Diff Detail

Unit TestsFailed

Event Timeline

dtcxzyw created this revision.Aug 23 2023, 2:05 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 23 2023, 2:05 PM
dtcxzyw requested review of this revision.Aug 23 2023, 2:05 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 23 2023, 2:05 PM

This seems like it would be better as a DAG->DAG transform pre-lowering? Or perhaps a more general peephole to merge the neg(w)+li? Or do it in TableGen? I don't think this warrants custom C++ lowering.

Probably via setTargetDAGCombine(ISD::ATOMIC_LOAD_SUB)?

This seems like it would be better as a DAG->DAG transform pre-lowering? Or perhaps a more general peephole to merge the neg(w)+li? Or do it in TableGen? I don't think this warrants custom C++ lowering.

The isel patterns are creating the neg, it does seem better to create that early before isel to give maximum opportunity to combine it. The code in this patch seems very similar to AArch64.

Does this patch improve this too

define signext i32 @atomicrmw_sub_i32_monotonic(ptr %a, i32 %x, i32 %y) nounwind {

  %b = sub i32 %x, %y
  %1 = atomicrmw sub ptr %a, i32 %b monotonic
  ret i32 %1
  }

it currently generates

atomicrmw_sub_i32_monotonic:            # @atomicrmw_sub_i32_monotonic
        subw    a1, a1, a2
        neg     a1, a1
        amoadd.w        a0, a1, (a0)
        ret

But I think we could swap the operands to the subw to remove the neg.

craig.topper added inline comments.Aug 23 2023, 2:43 PM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
1241

Why do we need to handle i32 on RV64? Wouldn't it be enough to do XLenLLT and remove the change to ReplaceNodeResults?

dtcxzyw updated this revision to Diff 552985.Aug 23 2023, 9:57 PM
dtcxzyw added a reviewer: jrtc27.
  • Rebase
  • Address comments
dtcxzyw marked an inline comment as done.Aug 23 2023, 9:58 PM
craig.topper added inline comments.Aug 24 2023, 11:01 AM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
1239

Drop curly braces

craig.topper added inline comments.Aug 24 2023, 11:03 AM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
2916

I don't think this safe to do this if you don't check that the VT in Operand 1 matches MemoryVT?

dtcxzyw updated this revision to Diff 553227.Aug 24 2023, 12:21 PM
  • Rebase
  • Address feedback
dtcxzyw marked 2 inline comments as done.Aug 24 2023, 12:21 PM

Can we not just teach SelectionDAG to handle ATOMIC_LOAD_SUB=Expand, ATOMIC_LOAD_ADD=Legal?

(That would allow AArch64 and RISCV to share support for this, which really doesn't need any target-specific knowledge)

Can we not just teach SelectionDAG to handle ATOMIC_LOAD_SUB=Expand, ATOMIC_LOAD_ADD=Legal?

I think we can let RISCVTargetLowering::shouldExpandAtomicRMWInIR(atomicrmw sub) return Expand and handle it in RISCVTargetLowering::emitExpandAtomicRMW.

Can we not just teach SelectionDAG to handle ATOMIC_LOAD_SUB=Expand, ATOMIC_LOAD_ADD=Legal?

I think we can let RISCVTargetLowering::shouldExpandAtomicRMWInIR(atomicrmw sub) return Expand and handle it in RISCVTargetLowering::emitExpandAtomicRMW.

That still makes it target-dependent, but the expansion at the IR or SelectionDAG level is target-independent. There is no reason we should have separate code for RISCV and AArch64. Sharing code rather than duplicating functionality is good practice when it makes sense, and I don't see a reason why it wouldn't here.

craig.topper added a comment.EditedAug 24 2023, 1:50 PM

Can we not just teach SelectionDAG to handle ATOMIC_LOAD_SUB=Expand, ATOMIC_LOAD_ADD=Legal?

I think we can let RISCVTargetLowering::shouldExpandAtomicRMWInIR(atomicrmw sub) return Expand and handle it in RISCVTargetLowering::emitExpandAtomicRMW.

That still makes it target-dependent, but the expansion at the IR or SelectionDAG level is target-independent. There is no reason we should have separate code for RISCV and AArch64. Sharing code rather than duplicating functionality is good practice when it makes sense, and I don't see a reason why it wouldn't here.

We can probably do this in LegalizeDAG::ExpandNode. First we need to change very target that really wants a sub LibCall to pass LibCall instead of Expand to setOperationAction. Then we could add an Expand action for ATOMIC_SUB to ExpandNode that uses NEG+ATOMIC_ADD. RISC-V and AArch64 could use the Expand action. Need to change AArch64TargetLowering constructor to figure out the cases to use Expand.

I think we also need a DAGCombine to call SimplifyDemandedBits on the operand based on the memory VT in order to remove the SIGN_EXTEND_INREG.

Can we not just teach SelectionDAG to handle ATOMIC_LOAD_SUB=Expand, ATOMIC_LOAD_ADD=Legal?

I think we can let RISCVTargetLowering::shouldExpandAtomicRMWInIR(atomicrmw sub) return Expand and handle it in RISCVTargetLowering::emitExpandAtomicRMW.

That still makes it target-dependent, but the expansion at the IR or SelectionDAG level is target-independent. There is no reason we should have separate code for RISCV and AArch64. Sharing code rather than duplicating functionality is good practice when it makes sense, and I don't see a reason why it wouldn't here.

We can probably do this in LegalizeDAG::ExpandNode. First we need to change very target that really wants a sub LibCall to pass LibCall instead of Expand to setOperationAction.

I don't think you need that part, just check if add is legal, use that if so, otherwise fall back on a libcall?

Can we not just teach SelectionDAG to handle ATOMIC_LOAD_SUB=Expand, ATOMIC_LOAD_ADD=Legal?

I think we can let RISCVTargetLowering::shouldExpandAtomicRMWInIR(atomicrmw sub) return Expand and handle it in RISCVTargetLowering::emitExpandAtomicRMW.

That still makes it target-dependent, but the expansion at the IR or SelectionDAG level is target-independent. There is no reason we should have separate code for RISCV and AArch64. Sharing code rather than duplicating functionality is good practice when it makes sense, and I don't see a reason why it wouldn't here.

We can probably do this in LegalizeDAG::ExpandNode. First we need to change very target that really wants a sub LibCall to pass LibCall instead of Expand to setOperationAction.

I don't think you need that part, just check if add is legal, use that if so, otherwise fall back on a libcall?

AArch64 seems to create add libcalls from sub. But maybe that isn't intentional?

Can we not just teach SelectionDAG to handle ATOMIC_LOAD_SUB=Expand, ATOMIC_LOAD_ADD=Legal?

I think we can let RISCVTargetLowering::shouldExpandAtomicRMWInIR(atomicrmw sub) return Expand and handle it in RISCVTargetLowering::emitExpandAtomicRMW.

That still makes it target-dependent, but the expansion at the IR or SelectionDAG level is target-independent. There is no reason we should have separate code for RISCV and AArch64. Sharing code rather than duplicating functionality is good practice when it makes sense, and I don't see a reason why it wouldn't here.

We can probably do this in LegalizeDAG::ExpandNode. First we need to change very target that really wants a sub LibCall to pass LibCall instead of Expand to setOperationAction.

I don't think you need that part, just check if add is legal, use that if so, otherwise fall back on a libcall?

AArch64 seems to create add libcalls from sub. But maybe that isn't intentional?

For outlined atomics? That's intentional, they correspond to the available LSE instructions, and behave as if they were instructions.

(But AArch64 does mark ATOMIC_LOAD_ADD as LibCall already)

(But AArch64 does mark ATOMIC_LOAD_ADD as LibCall already)

So AArch64 would fail your suggestion "just check if add is legal, use that if so, otherwise fall back on a libcall?" since add wouldn't be legal it would be libcall.

(But AArch64 does mark ATOMIC_LOAD_ADD as LibCall already)

So AArch64 would fail your suggestion "just check if add is legal, use that if so, otherwise fall back on a libcall?" since add wouldn't be legal it would be libcall.

I guess "not Expand" rather than "is Legal" then?

(But AArch64 does mark ATOMIC_LOAD_ADD as LibCall already)

So AArch64 would fail your suggestion "just check if add is legal, use that if so, otherwise fall back on a libcall?" since add wouldn't be legal it would be libcall.

I guess "not Expand" rather than "is Legal" then?

(or "is Legal or LibCall" depending on your view of Custom)

dtcxzyw updated this revision to Diff 553465.Aug 25 2023, 7:25 AM
dtcxzyw added a reviewer: olista01.
  • Rebase
  • Expand ATOMIC_LOAD_SUB to NEG+ATOMIC_LOAD_LOAD in LegalizeDAG::ExpandNode

Related patch (AArch64): D42477

jrtc27 added inline comments.Aug 25 2023, 12:25 PM
llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
3137 ↗(On Diff #553465)

This should be conditional on what ATOMIC_LOAD_ADD is (see forced-atomics.ll for unnecessary churn, though that won't show that this _adds_ an instruction for non-constant operands, at least I assume)

craig.topper added inline comments.Aug 25 2023, 12:29 PM
llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
3143 ↗(On Diff #553465)

This is not the correct way to find out the sign_extend_inreg can be removed. Operand 0 of SIGN_EXTEND_INREG always matches the destination type which means it always matches VT if its present. So this will delete any SIGN_EXTEND_INREG no matter what.

The check you really need is

if (RHS->getOpcode() == ISD::SIGN_EXTEND_INREG &&
    cast<VTSDNode>(RHS->getOperand(1))->getVT() == AN->getMemoryVT())

Looks like you didn't update tests for all targets affected by this patch.

dtcxzyw updated this revision to Diff 553726.Aug 26 2023, 6:34 AM
  • Rebase
  • Fix ARM/RISCV regression tests
dtcxzyw marked 2 inline comments as done.Aug 26 2023, 6:34 AM

I think this patch would change behavior atomicrmw sub for mips16, but it looks to be untested. llvm/test/CodeGen/Mips/atomicops.ll is the mips16 atomic test but it does not check all operations.

I think this patch would change behavior atomicrmw sub for mips16, but it looks to be untested. llvm/test/CodeGen/Mips/atomicops.ll is the mips16 atomic test but it does not check all operations.

I will add some pre-commit tests later.

dtcxzyw updated this revision to Diff 554154.Aug 28 2023, 8:46 PM
  • Rebase
  • Fix Mips16 regression tests
asb added a comment.Sep 5 2023, 6:32 AM

Left a couple of very minor comment. The approach seems sound to me, but there's clear potential interaction with other targets and so I'd rather rely on a LGTM from someone who's been involved in this patch since the beginning, but can take a closer look if no-one has time.

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
1230

This comment is now out of date and should probably just be something like "Force __sync libcalls to be emitted for atomic rmw/cas operations."

llvm/lib/Target/RISCV/RISCVInstrInfoA.td
336

This "heading" now has nothing under it (and I think was in the wrong place anyway), so best delete it.

dtcxzyw updated this revision to Diff 556335.Sep 8 2023, 8:15 PM
  • Rebase
  • Address feedback
dtcxzyw updated this revision to Diff 556336.Sep 8 2023, 8:17 PM
dtcxzyw marked 2 inline comments as done.

Update diff with full context

This revision is now accepted and ready to land.Sep 15 2023, 1:24 PM
This revision was landed with ongoing or failed builds.Sep 16 2023, 2:10 AM
This revision was automatically updated to reflect the committed changes.