Page MenuHomePhabricator

[RISCV] Don't call setHasMultipleConditionRegisters(), so icmp is sunk
ClosedPublic

Authored by craig.topper on Mar 19 2021, 2:33 AM.

Details

Summary

On RISC-V, icmp is not sunk (as the following snippet shows) which
generates the following suboptimal branch pattern:

  core_list_find:
	lh	a2, 2(a1)
	seqz	a3, a0         <<
	bltz	a2, .LBB0_5
	bnez	a3, .LBB0_9    << should sink the seqz
        [...]
	j	.LBB0_9
  .LBB0_5:
	bnez	a3, .LBB0_9    << should sink the seqz
	lh	a1, 0(a1)
        [...]

due to an icmp not being sunk.

The blocks after codegenprepare look as follows:

define dso_local %struct.list_head_s* @core_list_find(%struct.list_head_s* readonly %list, %struct.list_data_s* nocapture readonly %info) local_unnamed_addr #0 {
entry:
  %idx = getelementptr inbounds %struct.list_data_s, %struct.list_data_s* %info, i64 0, i32 1
  %0 = load i16, i16* %idx, align 2, !tbaa !4
  %cmp = icmp sgt i16 %0, -1
  %tobool.not37 = icmp eq %struct.list_head_s* %list, null
  br i1 %cmp, label %while.cond.preheader, label %while.cond9.preheader

while.cond9.preheader:                            ; preds = %entry
  br i1 %tobool.not37, label %return, label %land.rhs11.lr.ph

where the %tobool.not37 is the result of the icmp that is not sunk.
Note that it is computed in the basic-block up until what becomes the
bltz instruction and the bnez is a basic-block of its own.

Compare this to what happens on AArch64 (where the icmp is correctly sunk):

define dso_local %struct.list_head_s* @core_list_find(%struct.list_head_s* readonly %list, %struct.list_data_s* nocapture readonly %info) local_unnamed_addr #0 {
entry:
  %idx = getelementptr inbounds %struct.list_data_s, %struct.list_data_s* %info, i64 0, i32 1
  %0 = load i16, i16* %idx, align 2, !tbaa !6
  %cmp = icmp sgt i16 %0, -1
  br i1 %cmp, label %while.cond.preheader, label %while.cond9.preheader

while.cond9.preheader:                            ; preds = %entry
  %1 = icmp eq %struct.list_head_s* %list, null
  br i1 %1, label %return, label %land.rhs11.lr.ph

This is caused by sinkCmpExpression() being skipped, if multiple
condition registers are supported.

Given that the check for multiple condition registers affect only
sinkCmpExpression() and shouldNormalizeToSelectSequence(), this change
adjusts the RISC-V target as follows:

  • we no longer signal multiple condition registers (thus changing the behaviour of sinkCmpExpression() back to sinking the icmp)
  • we override shouldNormalizeToSelectSequence() to let always select the preferred normalisation strategy for our backend

With both changes, the test results remain unchanged. Note that without
the target-specific override to shouldNormalizeToSelectSequence(), there
is worse code (more branches) generated for select-and.ll and select-or.ll.

The original test case changes as expected:

  core_list_find:
	lh	a2, 2(a1)
	bltz	a2, .LBB0_5
	beqz	a0, .LBB0_9    <<
        [...]
	j	.LBB0_9
.LBB0_5:
	beqz	a0, .LBB0_9    <<
	lh	a1, 0(a1)
        [...]

Diff Detail

Event Timeline

philipp.tomsich requested review of this revision.Mar 19 2021, 2:33 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 19 2021, 2:33 AM

Removed comment left-over after removing the call to setHasMultipleConditionRegisters().

I'm surprised there aren't any tests affected by this. Are you able to add some? Perhaps you could pre-commit them so we can see the effect on codegen introduced by this patch.

Additionally, do we expect any regressions? I have tinkered with this setting with another backend I used to work on, and I saw mixed results on these kinds of benchmarks.

asb added a comment.Apr 1 2021, 5:25 AM

It would definitely be good to get a test case for this as Fraser says.

I appreciate it's a bit hard to implement an alternate fix that doesn't involve the risk of regressing other backends, but it's not ideal to remove a call to a hook that really RISC-V should be setting (per the description of setHasMultipleConditionRegisters, this should be set for RISC-V).

This is a good point fix for the immediate problem, but the concern is just that future optimiser changes means there are more issues or missed optimisation opportunities down the road.

If we did want to do this, probably worth adding comments to explain that we'd like to setHasMultipleConditionRegisters ideally, but are choosing not to due to codegen regressions.

Added test case.

Could you please clean up the test a bit? It contains references to attribute groups that don't actually exist (#0, #1). One attribute you do want to add is nounwind, to avoid the clutter caused by the CFI directives. In manually written tests we also generally don't include some of those dso_local, local_unnamed_addr, etc. In general, it would be nice to make it look less like Clang's output and more like something that can be easily read and reasoned about.

Simplified and cleaned up the test case.

asb added a comment.Apr 15 2021, 3:38 AM
In D98932#2663612, @asb wrote:

It would definitely be good to get a test case for this as Fraser says.

I appreciate it's a bit hard to implement an alternate fix that doesn't involve the risk of regressing other backends, but it's not ideal to remove a call to a hook that really RISC-V should be setting (per the description of setHasMultipleConditionRegisters, this should be set for RISC-V).

This is a good point fix for the immediate problem, but the concern is just that future optimiser changes means there are more issues or missed optimisation opportunities down the road.

If we did want to do this, probably worth adding comments to explain that we'd like to setHasMultipleConditionRegisters ideally, but are choosing not to due to codegen regressions.

Hi Philipp - did you have any thoughts on whether an alternate fix may be viable (even if it involves adding a new hook)? As noted above, claiming not to have multiple condition registers when we do (at least according to the setHasMultipleConditionRegisters doc comment) isn't ideal.

[Looks like the mail-to-phabricator gateway truncated my reply, so here is the original message again that followed the "Alex," line.]

I did some digging and looked at the original commit that added this hook:

This functionality will be used by the PowerPC backend in an upcoming commit.
Especially when the PowerPC backend starts tracking individual condition
register bits as separate allocatable entities (which will happen in this
upcoming commit), this sinking from CodeGenPrepare::OptimizeInst is
significantly suboptimial.

So the assumption there was that a ConditionRegister would not simply be a register that holds a zero or non-zero value (as is true for RISC-V), but rather a register that has multiple condition codes encoded in it. This is simply not true for RISC-V (the same as it isn't true for MIPS), as we do not generate a Condition value, but rather a Boolean value. Note that MIPS also does not signal multiple condition registers (only PowerPC and AMDGPU do so).

I see two solutions:

  1. We make it clear in the comments that this is meant for condition values that are not 'just truth values'.
  2. We split the hook up and signal having multiple condition registers, but have a second hook that signals the 'quality' of condition registers (i.e. that these are merely booleans).

I don't see the need to clarify the comment, as "condition register" has a meaning (at least to my understanding) that is obviously orthogonal to how truth values are generated in RISC-V... if the consensus is to clarify that this view of condition register should be used, I'll happily update the comment in include/llvm/CodeGen/TargetLowering.h

I just encountered some code that had multiple seqz instructions in one basic block that are used by branch on zero in other blocks. I think it ended up leading to at least one extra spill in my case. So I was wondering if this is still being looked out.

PaoloS added a subscriber: PaoloS.Jul 8 2021, 3:02 AM
PaoloS added a comment.EditedJul 8 2021, 6:45 AM

I quite agree that hasMultipleConditionRegisters seems to be intended for targets like PowerPC (only when it treats the 32 bits of the condition register as separate entities).
It also seems to me that the hook was added to RISCV as a simpler way make the RISCV backend lower select into more more logic instructions and fewer branches, as explained here: https://reviews.llvm.org/D79268
This wasn't probably very descriptive of the condition registers situation of RISCV that differs from the one of PowerPC (and AMDGPU) for which the hook was made for.
As already said, due to a previous commit, the hook hasMultipleConditionRegisters also influences (prevents) the sinking of compare instructions (sinkCmpExpression) and I found myself that by sinking them we get better code size.
I ran the benchmarks of Embench (https://github.com/embench/embench-iot) with a custom option I added (for proof of concept: https://reviews.llvm.org/D105620) and saw that by turning off that hook we achieve better code size.
I think that the idea to create another hook that is specific to the lowering of the select instructions is good. If that's preferable to overriding shouldNormalizeToSelectSequence to just return false.
(Hope you don't mind that I extracted part of your test for the test of my proof of concept: https://reviews.llvm.org/D105620, it was neat)

Is this still being looked at? I see that by disabling HasMultpipleConditionRegisters upstream we still get better code size with embench.

I thought that this was subsumed by a different change.
If we are still interested in it, I can pick it up again…

Hi @philipp.tomsich do you know which change is that? Has it been already upstreamed? Because as far as I see by rebasing this patch on the upstream main we still have some code size reduction and the tests still succeed

asb added a comment.Sep 30 2021, 5:56 AM

Hi @philipp.tomsich do you know which change is that? Has it been already upstreamed? Because as far as I see by rebasing this patch on the upstream main we still have some code size reduction and the tests still succeed

Sorry this patch has fallen to the wayside somewhat.

It sounds like this change is triggering minor codegen improvements for multiple workloads, so if you can confirm it hasn't been subsumed by other work then I'm happy for it to land.

After reviewing, it is not subsumed by the other work — I originally
thought that some of the changes that Jessica had landed would have made
this unnecessary, but was wrong on that account.

jrtc27 added inline comments.Sep 30 2021, 6:01 AM
llvm/test/CodeGen/RISCV/sink-icmp.ll
1

Could you please show the diff to this test?

jrtc27 added inline comments.Sep 30 2021, 6:03 AM
llvm/test/CodeGen/RISCV/sink-icmp.ll
4

The RVxxI and RVxxIBT CHECK lines look identical?

It's also not clear what this test is trying to test, could you please add a short summary to it so people know what regressions to look for?

4

(though maybe seeing the diff for this would make that obvious and any regressions would be similarly obvious?)

evandro added inline comments.Sep 30 2021, 7:14 AM
llvm/test/CodeGen/RISCV/sink-icmp.ll
1

@philipp.tomsich, please commit this test before the change so that this patch shows just its improvements to it. Other than other comments, it LGTM.

craig.topper added inline comments.Oct 7 2021, 11:15 AM
llvm/test/CodeGen/RISCV/sink-icmp.ll
5

Use RV32ZBT or RV32IZBT. Don't drop the Z.

craig.topper commandeered this revision.Nov 18 2021, 7:11 PM
craig.topper edited reviewers, added: philipp.tomsich; removed: craig.topper.

@philipp.tomsich I'm going to commandeer this and see if I can get it pushed forward.

craig.topper edited the summary of this revision. (Show Details)Nov 18 2021, 7:12 PM

Rebase to just show test change. I haven't pre-committed the test to the repo yet.

llvm/test/CodeGen/RISCV/sink-icmp.ll
5

I dropped Zbt since it would only apply if there was a select in the test.

No objections from me.
I would be unlikely to find time to work on moving this forward until after
RiSC-V Summit, so it‘s much appreciated if someone else rebases and merges
it.

asb accepted this revision.Nov 19 2021, 7:53 AM
This revision is now accepted and ready to land.Nov 19 2021, 7:53 AM
This revision was landed with ongoing or failed builds.Nov 19 2021, 8:40 AM
This revision was automatically updated to reflect the committed changes.