Download Raw Diff

Details

Reviewers

evandro
frasercrmck
philipp.tomsich
asb

Commits

rGaf57a71d1871: [RISCV] Don't call setHasMultipleConditionRegisters(), so icmp is sunk

Summary

On RISC-V, icmp is not sunk (as the following snippet shows) which
generates the following suboptimal branch pattern:

  core_list_find:
	lh	a2, 2(a1)
	seqz	a3, a0         <<
	bltz	a2, .LBB0_5
	bnez	a3, .LBB0_9    << should sink the seqz
        [...]
	j	.LBB0_9
  .LBB0_5:
	bnez	a3, .LBB0_9    << should sink the seqz
	lh	a1, 0(a1)
        [...]

due to an icmp not being sunk.

The blocks after codegenprepare look as follows:

define dso_local %struct.list_head_s* @core_list_find(%struct.list_head_s* readonly %list, %struct.list_data_s* nocapture readonly %info) local_unnamed_addr #0 {
entry:
  %idx = getelementptr inbounds %struct.list_data_s, %struct.list_data_s* %info, i64 0, i32 1
  %0 = load i16, i16* %idx, align 2, !tbaa !4
  %cmp = icmp sgt i16 %0, -1
  %tobool.not37 = icmp eq %struct.list_head_s* %list, null
  br i1 %cmp, label %while.cond.preheader, label %while.cond9.preheader

while.cond9.preheader:                            ; preds = %entry
  br i1 %tobool.not37, label %return, label %land.rhs11.lr.ph

where the %tobool.not37 is the result of the icmp that is not sunk.
Note that it is computed in the basic-block up until what becomes the
bltz instruction and the bnez is a basic-block of its own.

Compare this to what happens on AArch64 (where the icmp is correctly sunk):

define dso_local %struct.list_head_s* @core_list_find(%struct.list_head_s* readonly %list, %struct.list_data_s* nocapture readonly %info) local_unnamed_addr #0 {
entry:
  %idx = getelementptr inbounds %struct.list_data_s, %struct.list_data_s* %info, i64 0, i32 1
  %0 = load i16, i16* %idx, align 2, !tbaa !6
  %cmp = icmp sgt i16 %0, -1
  br i1 %cmp, label %while.cond.preheader, label %while.cond9.preheader

while.cond9.preheader:                            ; preds = %entry
  %1 = icmp eq %struct.list_head_s* %list, null
  br i1 %1, label %return, label %land.rhs11.lr.ph

This is caused by sinkCmpExpression() being skipped, if multiple
condition registers are supported.

Given that the check for multiple condition registers affect only
sinkCmpExpression() and shouldNormalizeToSelectSequence(), this change
adjusts the RISC-V target as follows:

we no longer signal multiple condition registers (thus changing the behaviour of sinkCmpExpression() back to sinking the icmp)
we override shouldNormalizeToSelectSequence() to let always select the preferred normalisation strategy for our backend

With both changes, the test results remain unchanged. Note that without
the target-specific override to shouldNormalizeToSelectSequence(), there
is worse code (more branches) generated for select-and.ll and select-or.ll.

The original test case changes as expected:

  core_list_find:
	lh	a2, 2(a1)
	bltz	a2, .LBB0_5
	beqz	a0, .LBB0_9    <<
        [...]
	j	.LBB0_9
.LBB0_5:
	beqz	a0, .LBB0_9    <<
	lh	a1, 0(a1)
        [...]

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

philipp.tomsich created this revision.Mar 19 2021, 2:33 AM

Herald added subscribers: vkmr, frasercrmck, evandro and 27 others. · View Herald TranscriptMar 19 2021, 2:33 AM

philipp.tomsich requested review of this revision.Mar 19 2021, 2:33 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 19 2021, 2:33 AM

Herald added subscribers: llvm-commits, MaskRay. · View Herald Transcript

Removed comment left-over after removing the call to setHasMultipleConditionRegisters().

philipp.tomsich added reviewers: evandro, craig.topper, frasercrmck.Mar 19 2021, 2:38 AM

Harbormaster completed remote builds in B94646: Diff 331804.Mar 19 2021, 3:09 AM

Harbormaster completed remote builds in B94649: Diff 331807.Mar 19 2021, 3:22 AM

I'm surprised there aren't any tests affected by this. Are you able to add some? Perhaps you could pre-commit them so we can see the effect on codegen introduced by this patch.

Additionally, do we expect any regressions? I have tinkered with this setting with another backend I used to work on, and I saw mixed results on these kinds of benchmarks.

It would definitely be good to get a test case for this as Fraser says.

I appreciate it's a bit hard to implement an alternate fix that doesn't involve the risk of regressing other backends, but it's not ideal to remove a call to a hook that really RISC-V should be setting (per the description of setHasMultipleConditionRegisters, this should be set for RISC-V).

This is a good point fix for the immediate problem, but the concern is just that future optimiser changes means there are more issues or missed optimisation opportunities down the road.

If we did want to do this, probably worth adding comments to explain that we'd like to setHasMultipleConditionRegisters ideally, but are choosing not to due to codegen regressions.

Added test case.

Harbormaster completed remote builds in B96690: Diff 334657.Apr 1 2021, 6:28 AM

Could you please clean up the test a bit? It contains references to attribute groups that don't actually exist (#0, #1). One attribute you do want to add is nounwind, to avoid the clutter caused by the CFI directives. In manually written tests we also generally don't include some of those dso_local, local_unnamed_addr, etc. In general, it would be nice to make it look less like Clang's output and more like something that can be easily read and reasoned about.

Simplified and cleaned up the test case.

Harbormaster completed remote builds in B97679: Diff 336045.Apr 8 2021, 3:41 AM

In D98932#2663612, @asb wrote:

It would definitely be good to get a test case for this as Fraser says.

I appreciate it's a bit hard to implement an alternate fix that doesn't involve the risk of regressing other backends, but it's not ideal to remove a call to a hook that really RISC-V should be setting (per the description of setHasMultipleConditionRegisters, this should be set for RISC-V).

This is a good point fix for the immediate problem, but the concern is just that future optimiser changes means there are more issues or missed optimisation opportunities down the road.

If we did want to do this, probably worth adding comments to explain that we'd like to setHasMultipleConditionRegisters ideally, but are choosing not to due to codegen regressions.

Hi Philipp - did you have any thoughts on whether an alternate fix may be viable (even if it involves adding a new hook)? As noted above, claiming not to have multiple condition registers when we do (at least according to the setHasMultipleConditionRegisters doc comment) isn't ideal.

[Looks like the mail-to-phabricator gateway truncated my reply, so here is the original message again that followed the "Alex," line.]

I did some digging and looked at the original commit that added this hook:

This functionality will be used by the PowerPC backend in an upcoming commit.
Especially when the PowerPC backend starts tracking individual condition
register bits as separate allocatable entities (which will happen in this
upcoming commit), this sinking from CodeGenPrepare::OptimizeInst is
significantly suboptimial.

So the assumption there was that a ConditionRegister would not simply be a register that holds a zero or non-zero value (as is true for RISC-V), but rather a register that has multiple condition codes encoded in it. This is simply not true for RISC-V (the same as it isn't true for MIPS), as we do not generate a Condition value, but rather a Boolean value. Note that MIPS also does not signal multiple condition registers (only PowerPC and AMDGPU do so).

I see two solutions:

We make it clear in the comments that this is meant for condition values that are not 'just truth values'.
We split the hook up and signal having multiple condition registers, but have a second hook that signals the 'quality' of condition registers (i.e. that these are merely booleans).

I don't see the need to clarify the comment, as "condition register" has a meaning (at least to my understanding) that is obviously orthogonal to how truth values are generated in RISC-V... if the consensus is to clarify that this view of condition register should be used, I'll happily update the comment in include/llvm/CodeGen/TargetLowering.h

I just encountered some code that had multiple seqz instructions in one basic block that are used by branch on zero in other blocks. I think it ended up leading to at least one extra spill in my case. So I was wondering if this is still being looked out.

Herald added a subscriber: jeroen.dobbelaere. · View Herald TranscriptMay 21 2021, 6:49 PM

PaoloS added a subscriber: PaoloS.Jul 8 2021, 3:02 AM

I quite agree that hasMultipleConditionRegisters seems to be intended for targets like PowerPC (only when it treats the 32 bits of the condition register as separate entities).
It also seems to me that the hook was added to RISCV as a simpler way make the RISCV backend lower select into more more logic instructions and fewer branches, as explained here: https://reviews.llvm.org/D79268
This wasn't probably very descriptive of the condition registers situation of RISCV that differs from the one of PowerPC (and AMDGPU) for which the hook was made for.
As already said, due to a previous commit, the hook hasMultipleConditionRegisters also influences (prevents) the sinking of compare instructions (sinkCmpExpression) and I found myself that by sinking them we get better code size.
I ran the benchmarks of Embench (https://github.com/embench/embench-iot) with a custom option I added (for proof of concept: https://reviews.llvm.org/D105620) and saw that by turning off that hook we achieve better code size.
I think that the idea to create another hook that is specific to the lowering of the select instructions is good. If that's preferable to overriding shouldNormalizeToSelectSequence to just return false.
(Hope you don't mind that I extracted part of your test for the test of my proof of concept: https://reviews.llvm.org/D105620, it was neat)

Is this still being looked at? I see that by disabling HasMultpipleConditionRegisters upstream we still get better code size with embench.

I thought that this was subsumed by a different change.
If we are still interested in it, I can pick it up again…

Hi @philipp.tomsich do you know which change is that? Has it been already upstreamed? Because as far as I see by rebasing this patch on the upstream main we still have some code size reduction and the tests still succeed

In D98932#2997582, @PaoloS wrote:

Hi @philipp.tomsich do you know which change is that? Has it been already upstreamed? Because as far as I see by rebasing this patch on the upstream main we still have some code size reduction and the tests still succeed

Sorry this patch has fallen to the wayside somewhat.

It sounds like this change is triggering minor codegen improvements for multiple workloads, so if you can confirm it hasn't been subsumed by other work then I'm happy for it to land.

Herald added a subscriber: achieveartificialintelligence. · View Herald TranscriptSep 30 2021, 5:56 AM

After reviewing, it is not subsumed by the other work — I originally
thought that some of the changes that Jessica had landed would have made
this unnecessary, but was wrong on that account.

jrtc27 added inline comments.Sep 30 2021, 6:01 AM

llvm/test/CodeGen/RISCV/sink-icmp.ll
1	Could you please show the diff to this test?

jrtc27 added inline comments.Sep 30 2021, 6:03 AM

llvm/test/CodeGen/RISCV/sink-icmp.ll
4	The RVxxI and RVxxIBT CHECK lines look identical? It's also not clear what this test is trying to test, could you please add a short summary to it so people know what regressions to look for?
4	(though maybe seeing the diff for this would make that obvious and any regressions would be similarly obvious?)

evandro added inline comments.Sep 30 2021, 7:14 AM

llvm/test/CodeGen/RISCV/sink-icmp.ll
1	@philipp.tomsich, please commit this test before the change so that this patch shows just its improvements to it. Other than other comments, it LGTM.

craig.topper added inline comments.Oct 7 2021, 11:15 AM

llvm/test/CodeGen/RISCV/sink-icmp.ll
5	Use RV32ZBT or RV32IZBT. Don't drop the Z.

fpallares removed a subscriber: fpallares.Oct 14 2021, 4:09 AM

@philipp.tomsich I'm going to commandeer this and see if I can get it pushed forward.

Herald added subscribers: VincentWu, luke957. · View Herald TranscriptNov 18 2021, 7:11 PM

craig.topper edited the summary of this revision. (Show Details)Nov 18 2021, 7:12 PM

Rebase

Harbormaster completed remote builds in B135029: Diff 388375.Nov 18 2021, 9:08 PM

Rebase to just show test change. I haven't pre-committed the test to the repo yet.

llvm/test/CodeGen/RISCV/sink-icmp.ll
5	I dropped Zbt since it would only apply if there was a select in the test.

Harbormaster completed remote builds in B135041: Diff 388390.Nov 18 2021, 10:18 PM

No objections from me.
I would be unlikely to find time to work on moving this forward until after
RiSC-V Summit, so it‘s much appreciated if someone else rebases and merges
it.

asb accepted this revision.Nov 19 2021, 7:53 AM

This revision is now accepted and ready to land.Nov 19 2021, 7:53 AM

Herald added a subscriber: StephenFan. · View Herald TranscriptNov 19 2021, 7:53 AM

This revision was landed with ongoing or failed builds.Nov 19 2021, 8:40 AM

Closed by commit rGaf57a71d1871: [RISCV] Don't call setHasMultipleConditionRegisters(), so icmp is sunk (authored by philipp.tomsich, committed by craig.topper). · Explain Why

This revision was automatically updated to reflect the committed changes.

craig.topper added a commit: rGaf57a71d1871: [RISCV] Don't call setHasMultipleConditionRegisters(), so icmp is sunk.

craig.topper mentioned this in rG4b3518d50b30: [RISCV] Pre-commit test for D98932. NFC.

liaolucy mentioned this in D151180: [RISCV] select(C0, x, select(C1, x, y)) -> select(C0|C1, x, y).May 22 2023, 11:31 PM