Page MenuHomePhabricator

[RISCV] Custom-legalise 32-bit variable shifts on RV64

Authored by asb on Jan 22 2019, 10:45 PM.



The previous DAG combiner-based approach had an issue with infinite loops between the target-dependent and target-independent combiner logic (see PR40333). Although this was worked around in rL351806, the combiner-based approach is still potentially brittle and can fail to select the 32-bit shift variant when profitable to do so, as demonstrated in the pr40333.ll test case.

This patch instead introduces target-specific SelectionDAG nodes for SHLW/SRLW/SRAW and custom-lowers variable i32 shifts to them. pr40333.ll is a good example of how this approach can improve codegen.

There are codegen changes in atomic-rmw.ll and atomic-cmpxchg.ll but the new instruction sequences are semantically equivalent.

It likely makes sense to replace the 32-bit sdiv/udiv/srem combining logic in a similar way, but that belongs in a separate patch.

Diff Detail


Event Timeline

asb created this revision.Jan 22 2019, 10:45 PM
asb updated this revision to Diff 183091.Jan 23 2019, 5:52 AM

Refactored patch slightly in a way that will make it easier to add support for legalising sdiv/udiv/urem in the same way.

Also remove TODO from pr40333.ll

rogfer01 added inline comments.Jan 23 2019, 11:23 AM
541 ↗(On Diff #183091)

The documentation of ReplaceNodeResults already states this requirement of matching types between the original node and the new node. Looks to me the elaborated explanation here regarding the ty->ty no-op extends might be confusing.

I think we can go with a simple reminder that explains the final truncate.

680 ↗(On Diff #183091)

I don't understand the check isUInt<32>(Imm) and I suspect that given where this fragment is used (only in riscv_s*w nodes) this may not be needed.

In fact, I removed the check and the existing tests still pass. If this is actually needed we may be missing some coverage here.

efriedma added inline comments.Jan 23 2019, 12:47 PM
37 ↗(On Diff #183091)

Probably worth explicitly noting whether the shift amount is modulo. If it is, you might want to implement SimplifyDemandedBitsForTargetNode.

asb marked an inline comment as done.Jan 23 2019, 9:44 PM
asb added inline comments.
37 ↗(On Diff #183091)

Yes, only the lower 5 bits are used. Thanks for the tip re SimplifyDemandedBitsForTargetNode. I had hoped that would allow me to remove the patterns that match a masked shamt, as the DAG combiner will remove the mask instead. This works for the test cases with a sign-extended or zero-extended result, but not anyext as SimplifyDemandedBits is never called.

I'm not entirely sure what the best behaviour is here:

  • Leaving as-is
  • Leaving a pattern to handle the anyext case but adding SimplifyDemandedBitsForTargetNode
  • Adding a target DAG combine that calls SimplifyDemandedBits on the Shamt.
asb updated this revision to Diff 183254.Jan 23 2019, 10:57 PM
asb marked 4 inline comments as done.

Update to address review feedback (thanks!). Only a single pattern is needed for each DAG node now thanks to:

  • Adding a DAG combine that does SimplifyDemandedBits on the operands (only lower 32-bits of first operand and lower 5 bits of second operand are read). For this purpose, this seems better than implementing SimplifyDemandedBitsForTargetNode as there is no guarantee that would be called (and it's not for e.g. the anyext return test cases)
  • Implementing ComputeNumSignBitsForTargetNode
This revision is now accepted and ready to land.Jan 24 2019, 10:57 AM
This revision was automatically updated to reflect the committed changes.