Page MenuHomePhabricator

[RISCV] Don't call setHasMultipleConditionRegisters(), so icmp is sunk
Needs ReviewPublic

Authored by philipp.tomsich on Mar 19 2021, 2:33 AM.

Details

Summary

On RISC-V, icmp is not sunk (as the following snippet shows) which
generates the following suboptimal branch pattern:

core_list_find:

lh a2, 2(a1)
seqz a3, a0 <<
bltz a2, .LBB0_5
bnez a3, .LBB0_9 << should sink the seqz

[...]

j .LBB0_9

.LBB0_5:

bnez a3, .LBB0_9 << should sink the seqz
lh a1, 0(a1)

[...]

due to an icmp not being sunk.

The blocks after codegenprepare look as follows:

define dso_local %struct.list_head_s* @core_list_find(%struct.list_head_s* readonly %list, %struct.list_data_s* nocapture readonly %info) local_unnamed_addr #0 {
entry:
  %idx = getelementptr inbounds %struct.list_data_s, %struct.list_data_s* %info, i64 0, i32 1
  %0 = load i16, i16* %idx, align 2, !tbaa !4
  %cmp = icmp sgt i16 %0, -1
  %tobool.not37 = icmp eq %struct.list_head_s* %list, null
  br i1 %cmp, label %while.cond.preheader, label %while.cond9.preheader

while.cond9.preheader:                            ; preds = %entry
  br i1 %tobool.not37, label %return, label %land.rhs11.lr.ph

where the %tobool.not37 is the result of the icmp that is not sunk.
Note that it is computed in the basic-block up until what becomes the
bltz instruction and the bnez is a basic-block of its own.

Compare this to what happens on AArch64 (where the icmp is correctly sunk):

define dso_local %struct.list_head_s* @core_list_find(%struct.list_head_s* readonly %list, %struct.list_data_s* nocapture readonly %info) local_unnamed_addr #0 {
entry:
  %idx = getelementptr inbounds %struct.list_data_s, %struct.list_data_s* %info, i64 0, i32 1
  %0 = load i16, i16* %idx, align 2, !tbaa !6
  %cmp = icmp sgt i16 %0, -1
  br i1 %cmp, label %while.cond.preheader, label %while.cond9.preheader

while.cond9.preheader:                            ; preds = %entry
  %1 = icmp eq %struct.list_head_s* %list, null
  br i1 %1, label %return, label %land.rhs11.lr.ph

This is caused by sinkCmpExpression() being skippedd, if multiple
condition registers are supported.

Given that the check for multiple condition registers affect only
sinkCmpExpression() and shouldNormalizeToSelectSequence(), this change
adjusts the RISC-V target as follows:

  • we no longer signal multiple condition registers (thus changing the behaviour of sinkCmpExpression() back to sinking the icmp)
  • we override shouldNormalizeToSelectSequence() to let always select the preferred normalisation strategy for our backend

With both changes, the test results remain unchanged. Note that without
the target-specific override to shouldNormalizeToSelectSequence(), there
is worse code (more branches) generated for select-and.ll and select-or.ll.

The original test case changes as expected:

core_list_find:

lh a2, 2(a1)
bltz a2, .LBB0_5
beqz a0, .LBB0_9 <<

[...]

j .LBB0_9
.LBB0_5:
beqz a0, .LBB0_9 <<
lh a1, 0(a1)

[...]

Diff Detail

Event Timeline

philipp.tomsich requested review of this revision.Mar 19 2021, 2:33 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 19 2021, 2:33 AM

Removed comment left-over after removing the call to setHasMultipleConditionRegisters().

I'm surprised there aren't any tests affected by this. Are you able to add some? Perhaps you could pre-commit them so we can see the effect on codegen introduced by this patch.

Additionally, do we expect any regressions? I have tinkered with this setting with another backend I used to work on, and I saw mixed results on these kinds of benchmarks.

asb added a comment.Apr 1 2021, 5:25 AM

It would definitely be good to get a test case for this as Fraser says.

I appreciate it's a bit hard to implement an alternate fix that doesn't involve the risk of regressing other backends, but it's not ideal to remove a call to a hook that really RISC-V should be setting (per the description of setHasMultipleConditionRegisters, this should be set for RISC-V).

This is a good point fix for the immediate problem, but the concern is just that future optimiser changes means there are more issues or missed optimisation opportunities down the road.

If we did want to do this, probably worth adding comments to explain that we'd like to setHasMultipleConditionRegisters ideally, but are choosing not to due to codegen regressions.

Added test case.

Could you please clean up the test a bit? It contains references to attribute groups that don't actually exist (#0, #1). One attribute you do want to add is nounwind, to avoid the clutter caused by the CFI directives. In manually written tests we also generally don't include some of those dso_local, local_unnamed_addr, etc. In general, it would be nice to make it look less like Clang's output and more like something that can be easily read and reasoned about.

Simplified and cleaned up the test case.

asb added a comment.Thu, Apr 15, 3:38 AM
In D98932#2663612, @asb wrote:

It would definitely be good to get a test case for this as Fraser says.

I appreciate it's a bit hard to implement an alternate fix that doesn't involve the risk of regressing other backends, but it's not ideal to remove a call to a hook that really RISC-V should be setting (per the description of setHasMultipleConditionRegisters, this should be set for RISC-V).

This is a good point fix for the immediate problem, but the concern is just that future optimiser changes means there are more issues or missed optimisation opportunities down the road.

If we did want to do this, probably worth adding comments to explain that we'd like to setHasMultipleConditionRegisters ideally, but are choosing not to due to codegen regressions.

Hi Philipp - did you have any thoughts on whether an alternate fix may be viable (even if it involves adding a new hook)? As noted above, claiming not to have multiple condition registers when we do (at least according to the setHasMultipleConditionRegisters doc comment) isn't ideal.

[Looks like the mail-to-phabricator gateway truncated my reply, so here is the original message again that followed the "Alex," line.]

I did some digging and looked at the original commit that added this hook:

This functionality will be used by the PowerPC backend in an upcoming commit.
Especially when the PowerPC backend starts tracking individual condition
register bits as separate allocatable entities (which will happen in this
upcoming commit), this sinking from CodeGenPrepare::OptimizeInst is
significantly suboptimial.

So the assumption there was that a ConditionRegister would not simply be a register that holds a zero or non-zero value (as is true for RISC-V), but rather a register that has multiple condition codes encoded in it. This is simply not true for RISC-V (the same as it isn't true for MIPS), as we do not generate a Condition value, but rather a Boolean value. Note that MIPS also does not signal multiple condition registers (only PowerPC and AMDGPU do so).

I see two solutions:

  1. We make it clear in the comments that this is meant for condition values that are not 'just truth values'.
  2. We split the hook up and signal having multiple condition registers, but have a second hook that signals the 'quality' of condition registers (i.e. that these are merely booleans).

I don't see the need to clarify the comment, as "condition register" has a meaning (at least to my understanding) that is obviously orthogonal to how truth values are generated in RISC-V... if the consensus is to clarify that this view of condition register should be used, I'll happily update the comment in include/llvm/CodeGen/TargetLowering.h