On RISC-V, icmp is not sunk (as the following snippet shows) which
generates the following suboptimal branch pattern:
core_list_find: lh a2, 2(a1) seqz a3, a0 << bltz a2, .LBB0_5 bnez a3, .LBB0_9 << should sink the seqz [...] j .LBB0_9 .LBB0_5: bnez a3, .LBB0_9 << should sink the seqz lh a1, 0(a1) [...]
due to an icmp not being sunk.
The blocks after codegenprepare look as follows:
define dso_local %struct.list_head_s* @core_list_find(%struct.list_head_s* readonly %list, %struct.list_data_s* nocapture readonly %info) local_unnamed_addr #0 { entry: %idx = getelementptr inbounds %struct.list_data_s, %struct.list_data_s* %info, i64 0, i32 1 %0 = load i16, i16* %idx, align 2, !tbaa !4 %cmp = icmp sgt i16 %0, -1 %tobool.not37 = icmp eq %struct.list_head_s* %list, null br i1 %cmp, label %while.cond.preheader, label %while.cond9.preheader while.cond9.preheader: ; preds = %entry br i1 %tobool.not37, label %return, label %land.rhs11.lr.ph
where the %tobool.not37 is the result of the icmp that is not sunk.
Note that it is computed in the basic-block up until what becomes the
bltz instruction and the bnez is a basic-block of its own.
Compare this to what happens on AArch64 (where the icmp is correctly sunk):
define dso_local %struct.list_head_s* @core_list_find(%struct.list_head_s* readonly %list, %struct.list_data_s* nocapture readonly %info) local_unnamed_addr #0 { entry: %idx = getelementptr inbounds %struct.list_data_s, %struct.list_data_s* %info, i64 0, i32 1 %0 = load i16, i16* %idx, align 2, !tbaa !6 %cmp = icmp sgt i16 %0, -1 br i1 %cmp, label %while.cond.preheader, label %while.cond9.preheader while.cond9.preheader: ; preds = %entry %1 = icmp eq %struct.list_head_s* %list, null br i1 %1, label %return, label %land.rhs11.lr.ph
This is caused by sinkCmpExpression() being skipped, if multiple
condition registers are supported.
Given that the check for multiple condition registers affect only
sinkCmpExpression() and shouldNormalizeToSelectSequence(), this change
adjusts the RISC-V target as follows:
- we no longer signal multiple condition registers (thus changing the behaviour of sinkCmpExpression() back to sinking the icmp)
- we override shouldNormalizeToSelectSequence() to let always select the preferred normalisation strategy for our backend
With both changes, the test results remain unchanged. Note that without
the target-specific override to shouldNormalizeToSelectSequence(), there
is worse code (more branches) generated for select-and.ll and select-or.ll.
The original test case changes as expected:
core_list_find: lh a2, 2(a1) bltz a2, .LBB0_5 beqz a0, .LBB0_9 << [...] j .LBB0_9 .LBB0_5: beqz a0, .LBB0_9 << lh a1, 0(a1) [...]
Could you please show the diff to this test?