This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/RISCV/
-
Target/
-
RISCV/
4/13
RISCVISelLowering.cpp
-
test/CodeGen/RISCV/
-
CodeGen/
-
RISCV/
1
select-optimize-multiple.ll

Differential D127871

[RISCV] Optimize 2x SELECT for floating-point types
ClosedPublic

Authored by liaolucy on Jun 15 2022, 9:10 AM.

Download Raw Diff

Details

Reviewers

craig.topper
luismarques
asb
reames

Commits

rG3f68f0f8160e: [RISCV] Optimize 2x SELECT for floating-point types
rG1178992c72b0: [RISCV] Optimize 2x SELECT for floating-point types

Summary

Including the following opcode:
Select_FPR16_Using_CC_GPR
Select_FPR32_Using_CC_GPR
Select_FPR64_Using_CC_GPR

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

liaolucy created this revision.Jun 15 2022, 9:10 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 15 2022, 9:10 AM

Herald added subscribers: sunshaoce, VincentWu, luke957 and 28 others. · View Herald Transcript

liaolucy requested review of this revision.Jun 15 2022, 9:10 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 15 2022, 9:10 AM

Herald added subscribers: llvm-commits, • pcwang-thead, eopXD, MaskRay. · View Herald Transcript

Harbormaster completed remote builds in B170020: Diff 437206.Jun 15 2022, 11:08 AM

The asm of testcase before this patch:

        auipc   a0, %pcrel_hi(.LCPI0_0)
        addi    a0, a0, %pcrel_lo(.LBB0_5)
        flw     ft0, 0(a0)
        fmv.w.x ft1, zero
        flt.s   a1, fa0, ft1
        flt.s   a0, ft0, fa0
        beqz    a1, .LBB0_3
# %bb.1:                                # %entry
        beqz    a0, .LBB0_4
.LBB0_2:                                # %entry
        fmv.s   fa0, ft0
        ret
.LBB0_3:                                # %entry
        fmv.s   ft1, fa0
        bnez    a0, .LBB0_2
.LBB0_4:                                # %entry
        fmv.s   ft0, ft1
        fmv.s   fa0, ft0
        ret

after this patch :

# %bb.0:                                # %entry
        fmv.w.x ft0, zero
        flt.s   a0, fa0, ft0
        bnez    a0, .LBB0_3
# %bb.1:                                # %entry
.LBB0_4:                                # %entry
                                        # Label of block must be emitted
        auipc   a0, %pcrel_hi(.LCPI0_0)
        addi    a0, a0, %pcrel_lo(.LBB0_4)
        flw     ft0, 0(a0)
        flt.s   a0, ft0, fa0
        bnez    a0, .LBB0_3
# %bb.2:                                # %entry
        fmv.s   ft0, fa0
.LBB0_3:                                # %entry
        fmv.s   fa0, ft0
        ret

craig.topper added inline comments.Jun 18 2022, 12:02 PM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
9698	Variable names should start with a capital letter
9705	Why do we need an explicit PseudoBR? Can't we let it fallthrough?
9775	Should this use `next_nodbg` to increment?
9777	Shouldn't we be checking `NextMIIt != BB->end()` before checking the opcode of `NextMIIt`?
llvm/test/CodeGen/RISCV/select-optimize-multiple.ll
537	Drop dso_local and local_unnamed_addr and `#0`

address craig.topper's comments and thanks

Harbormaster completed remote builds in B170770: Diff 438263.Jun 20 2022, 12:00 AM

LGTM

This revision is now accepted and ready to land.Jun 27 2022, 9:35 AM

craig.topper mentioned this in D128124: [X86] Remove unnecessary COPY from EmitLoweredCascadedSelect..Jun 27 2022, 5:41 PM

This revision was landed with ongoing or failed builds.Jun 27 2022, 9:02 PM

Closed by commit rG1178992c72b0: [RISCV] Optimize 2x SELECT for floating-point types (authored by liaolucy). · Explain Why

This revision was automatically updated to reflect the committed changes.

liaolucy added a commit: rG1178992c72b0: [RISCV] Optimize 2x SELECT for floating-point types.

Hi!

This commit is causing a correctness regression in one of the ML models in IREE. Please, find attach the LLVM IR after Codegen Prepare. Hopefully, you can easily identify where this change is triggering.
To reproduce: llc bug.ll -mcpu=generic-rv64 -mattr=+m,+a,+f,+d,+c -target-abi=lp64d. Please, note that no +v is provided. Could we please consider a revert while this is being investigated?

Thanks,
Diego

bug.ll1 MBDownload

In D127871#3634472, @dcaballe wrote:

Hi!

This commit is causing a correctness regression in one of the ML models in IREE. Please, find attach the LLVM IR after Codegen Prepare. Hopefully, you can easily identify where this change is triggering.
To reproduce: llc bug.ll -mcpu=generic-rv64 -mattr=+m,+a,+f,+d,+c -target-abi=lp64d. Please, note that no +v is provided. Could we please consider a revert while this is being investigated?

Thanks,
Diego

bug.ll1 MBDownload

@dcaballe Can you provide a file that can be compiled and run? Help me to debug.

I was able to reduce the test case to:

target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n64-S128"                                                                                                                                
target triple = "riscv64-unknown-unknown-eabi-elf"                                                                                                                                          
                                                                                                                                                                                            
; Function Attrs: nofree nosync nounwind                                                                                                                                                    
define internal i32 @main_dispatch_72(i8* %i5, <2 x float> %i44, <2 x float> %i61, <2 x i32> %i65) {                                                                                        
bbi:                                                                                                                                                                                        
  %i45 = fcmp uno <2 x float> %i44, zeroinitializer                                                                                                                                         
  %i62 = fcmp oeq <2 x float> %i44, <float 0xFFF0000000000000, float 0xFFF0000000000000>                                                                                                    
  %i63 = fcmp oeq <2 x float> %i44, <float 0x7FF0000000000000, float 0x7FF0000000000000>                                                                                                    
  %i64 = fcmp ogt <2 x float> %i44, zeroinitializer                                                                                                                                         
  %i66 = icmp ult <2 x i32> %i65, <i32 255, i32 255>                                                                                                                                        
  %i67 = select <2 x i1> %i64, <2 x float> <float 0x7FF0000000000000, float 0x7FF0000000000000>, <2 x float> <float 0x3810000000000000, float 0x3810000000000000>                           
  %i68 = select <2 x i1> %i66, <2 x float> %i61, <2 x float> %i67                                                                                                                           
  %i69 = select <2 x i1> %i63, <2 x float> <float 0x7FF0000000000000, float 0x7FF0000000000000>, <2 x float> %i68                                                                           
  %i70 = select <2 x i1> %i62, <2 x float> zeroinitializer, <2 x float> %i69                                                                                                                
  %i71 = select <2 x i1> %i45, <2 x float> %i44, <2 x float> %i70                                                                                                                           
  %i72 = bitcast i8* %i5 to <2 x float>*                                                                                                                                                    
  store <2 x float> %i71, <2 x float>* %i72, align 64                                                                                                                                       
  ret i32 0                                                                                                                                                                                 
}

Your code should trigger with llc bug.ll -mcpu=generic-rv64 -mattr=+m,+a,+f,+d,+c -target-abi=lp64d.

It looks like it's a case where your transformation is applied multiple times on the output produced by previous instances.

I'll revert this commit while this is being investigated.

Thanks,
Diego

dcaballe added a reverting change: rGbf1758c3dc4f: Revert "[RISCV] Optimize 2x SELECT for floating-point types".Jul 7 2022, 3:56 PM

I think the expanded branches are emitted in the wrong order. The condition for the last Select_FPR32_Using_CC_GPR needs to be checked first.

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
9775	I think we also need to make sure the condition for the second select doesn't use the output from the first select.

craig.topper reopened this revision.Jul 7 2022, 6:07 PM

This revision is now accepted and ready to land.Jul 7 2022, 6:07 PM

craig.topper requested changes to this revision.Jul 7 2022, 6:07 PM

This revision now requires changes to proceed.Jul 7 2022, 6:07 PM

In D127871#3637493, @craig.topper wrote:

I think the expanded branches are emitted in the wrong order. The condition for the last Select_FPR32_Using_CC_GPR needs to be checked first.

Thanks, Craig. I dump all select, The middle three pairs of instructions can be optimized

  %25:fpr32 = Select_FPR32_Using_CC_GPR killed %21:gpr, %24:gpr, 1, %16:fpr32, %23:fpr32
  %26:fpr32 = Select_FPR32_Using_CC_GPR killed %20:gpr, %24:gpr, 1, %16:fpr32, %23:fpr32
  %28:fpr32 = Select_FPR32_Using_CC_GPR killed %8:gpr, %27:gpr, 4, %4:fpr32, killed %26:fpr32

optimize1 
 1.1 %29:fpr32 = Select_FPR32_Using_CC_GPR killed %7:gpr, %27:gpr, 4, %3:fpr32, killed %25:fpr3
 2.1 %30:fpr32 = Select_FPR32_Using_CC_GPR killed %18:gpr, %24:gpr, 1, %16:fpr32, killed %29:fpr32
optimize2
 1.2 %31:fpr32 = Select_FPR32_Using_CC_GPR killed %17:gpr, %24:gpr, 1, %16:fpr32, killed %28:fpr32
 2.2 %32:fpr32 = Select_FPR32_Using_CC_GPR killed %14:gpr, %24:gpr, 1, %19:fpr32, killed %31:fpr32
optimize3
 1.3 %33:fpr32 = Select_FPR32_Using_CC_GPR killed %13:gpr, %24:gpr, 1, %19:fpr32, killed %30:fpr32
 2.3 %34:fpr32 = Select_FPR32_Using_CC_GPR killed %10:gpr, %24:gpr, 0, %1:fpr32, killed %33:fpr32

  %35:fpr32 = Select_FPR32_Using_CC_GPR killed %9:gpr, %24:gpr, 0, %2:fpr32, killed %32:fpr32

I think we should check select1-rs2 == select2-rs2, then 1.1 and 2.1 can not be optimized. I will update patch later

In D127871#3637829, @liaolucy wrote:

In D127871#3637493, @craig.topper wrote:

I think the expanded branches are emitted in the wrong order. The condition for the last Select_FPR32_Using_CC_GPR needs to be checked first.

Thanks, Craig. I dump all select, The middle three pairs of instructions can be optimized

  %25:fpr32 = Select_FPR32_Using_CC_GPR killed %21:gpr, %24:gpr, 1, %16:fpr32, %23:fpr32
  %26:fpr32 = Select_FPR32_Using_CC_GPR killed %20:gpr, %24:gpr, 1, %16:fpr32, %23:fpr32
  %28:fpr32 = Select_FPR32_Using_CC_GPR killed %8:gpr, %27:gpr, 4, %4:fpr32, killed %26:fpr32

optimize1 
 1.1 %29:fpr32 = Select_FPR32_Using_CC_GPR killed %7:gpr, %27:gpr, 4, %3:fpr32, killed %25:fpr3
 2.1 %30:fpr32 = Select_FPR32_Using_CC_GPR killed %18:gpr, %24:gpr, 1, %16:fpr32, killed %29:fpr32
optimize2
 1.2 %31:fpr32 = Select_FPR32_Using_CC_GPR killed %17:gpr, %24:gpr, 1, %16:fpr32, killed %28:fpr32
 2.2 %32:fpr32 = Select_FPR32_Using_CC_GPR killed %14:gpr, %24:gpr, 1, %19:fpr32, killed %31:fpr32
optimize3
 1.3 %33:fpr32 = Select_FPR32_Using_CC_GPR killed %13:gpr, %24:gpr, 1, %19:fpr32, killed %30:fpr32
 2.3 %34:fpr32 = Select_FPR32_Using_CC_GPR killed %10:gpr, %24:gpr, 0, %1:fpr32, killed %33:fpr32

  %35:fpr32 = Select_FPR32_Using_CC_GPR killed %9:gpr, %24:gpr, 0, %2:fpr32, killed %32:fpr32

I think we should check select1-rs2 == select2-rs2, then 1.1 and 2.1 can not be optimized. I will update patch later

I dont' think checking select1-rs2 == select2-rs2 solves the problem. When the condition of the second select is true, the true value of that select has priority. That means when the select2 condition is true, the select1 condition matter. The patch is trying to prioritize one of the compares to skip the other the other. That means you need to check the select2 condition first to maintain the priority.

liaolucy added inline comments.Jul 8 2022, 12:20 AM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
9775	I'm still confused. You mean, should I need to add: Next->getOperand(4).getReg() != MI.getOperand(0).getReg() ? But, the assembly of the bug.ll file has not changed.

craig.topper added inline comments.Jul 8 2022, 12:43 AM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
9775	I was theorizing another possible bug. If either rs1 or rs2 of the second select is the result of the first select. Meaning one of the compare operands is also the false operand. It’s not safe to do the optimization. Is that already protected?

liaolucy added inline comments.Jul 8 2022, 12:53 AM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
9775	If either rs1 or rs2 of the second select is the result of the first select. This condition does not hold. eg: fpr32 = Select_FPR32_Using_CC_GPR killed %13:gpr, %24:gpr, 1, %19:fpr32, killed %30:fpr32 rs1 and rs2 are gpr, but dst is fpr.

liaolucy added inline comments.Jul 8 2022, 12:55 AM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
9775	If either rs1 or rs2 of the second select is the result of the first select. This condition does not hold. eg: fpr32 = Select_FPR32_Using_CC_GPR killed %13:gpr, %24:gpr, 1, %19:fpr32, killed %30:fpr32 rs1 and rs2 are gpr, but dst is fpr,

craig.topper added inline comments.Jul 8 2022, 9:08 AM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
9775	You're right. I forgot this was FP only.

Happy to run any tentative fix that you may have on the full model to know if it fixes the problem.

Try to fix according to my guess. @dcaballe Could you help test it? If it can't be solved, I may need to spend more time to analyze.

Harbormaster completed remote builds in B174489: Diff 443400.Jul 8 2022, 7:23 PM

It works! I must have messed up the test when trimming it. I'm attaching the whole function for your reference. Your code was invoked four times on this function with the previous version of the code. It's only invoked twice with the fixed one.

function.ll8 KBDownload

Thanks!

craig.topper added inline comments.Jul 9 2022, 1:26 PM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
9777	I don't understand the new check. Why was it wrong to optimize 1.1 and 2.1 in the failed case?

liaolucy added inline comments.Jul 9 2022, 10:50 PM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

9777

Yesterday, I saw that x86 has this check.

// CMOV ((CMOV F, T, cc1), T, cc2) is checked here and handled by a separate
// function - EmitLoweredCascadedSelect

// This checks for case 2, but only do this if we didn't already find
// case 1, as indicated by LastCMOV == MI.
if (LastCMOV == &MI && NextMIIt != ThisMBB->end() &&
    NextMIIt->getOpcode() == MI.getOpcode() &&
    NextMIIt->getOperand(2).getReg() == MI.getOperand(2).getReg() &&
    NextMIIt->getOperand(1).getReg() == MI.getOperand(0).getReg() &&
    NextMIIt->getOperand(1).isKill()) {
  return EmitLoweredCascadedSelect(MI, *NextMIIt, ThisMBB);
}

Try to understand：
1.1 %29:fpr32 = Select_FPR32_Using_CC_GPR killed %7:gpr, %27:gpr, 4, %3:fpr32, killed %25:fpr3
2.1 %30:fpr32 = Select_FPR32_Using_CC_GPR killed %18:gpr, %24:gpr, 1, %16:fpr32, killed %29:fpr32

%7 (a), %27 (b)
%18 (c), %24 (d)
Eg:

a=b , c=d,
 a=b , c!=d
 a!=b, c=d,
 a!=b, c!=d

b=d:

%7 (a), %27 (b), 
%18 (c), %24 (b) 
a=b,  c=b     a=b=c
a=b,  c!=b 
                a!= c (optimize)   
a!=b, c=b
a!=b, c!=b   a!=b!=c

craig.topper added inline comments.Jul 9 2022, 11:13 PM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
9777	X86 has the CMP done has a separate instruction that writes EFLAGS. The getOperand(2) check for X86 is making sure that the EFLAGS come from the same CMP instruction. RISC-V has does the comparison has part of the branch so it's different. For 1.1 %29:fpr32 = Select_FPR32_Using_CC_GPR killed %7:gpr, %27:gpr, 4, %3:fpr32, killed %25:fpr3 2.1 %30:fpr32 = Select_FPR32_Using_CC_GPR killed %18:gpr, %24:gpr, 1, %16:fpr32, killed %29:fpr32 The original code is (%18 != %24) ? %16 : ((%7 < %27) ? %3 : %25) The transform this patch is trying to do needs to be bb1: BNE %18, %24, bb4 bb2: BLTU %7, %27, bb4 bb3: // fallthrough bb4 phi %16, bb1, %3, bb2, %25 bb3 The condition of the second select needs to be checked first.

address craig.topper's comments and thanks

Harbormaster completed remote builds in B174549: Diff 443489.Jul 10 2022, 4:46 AM

LGTM

This revision is now accepted and ready to land.Jul 10 2022, 10:45 PM

This revision was landed with ongoing or failed builds.Jul 10 2022, 11:10 PM

Closed by commit rG3f68f0f8160e: [RISCV] Optimize 2x SELECT for floating-point types (authored by liaolucy). · Explain Why

This revision was automatically updated to reflect the committed changes.

liaolucy added a commit: rG3f68f0f8160e: [RISCV] Optimize 2x SELECT for floating-point types.

Revision Contents

Path

Size

llvm/

lib/

Target/

RISCV/

RISCVISelLowering.cpp

114 lines

test/

CodeGen/

RISCV/

select-optimize-multiple.ll

76 lines

Diff 438263

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 9,622 Lines • ▼ Show 20 Lines	static MachineBasicBlock emitQuietFCMP(MachineInstr &MI, MachineBasicBlock BB,
if (MI.getFlag(MachineInstr::MIFlag::NoFPExcept))		if (MI.getFlag(MachineInstr::MIFlag::NoFPExcept))
MIB2->setFlag(MachineInstr::MIFlag::NoFPExcept);		MIB2->setFlag(MachineInstr::MIFlag::NoFPExcept);

// Erase the pseudoinstruction.		// Erase the pseudoinstruction.
MI.eraseFromParent();		MI.eraseFromParent();
return BB;		return BB;
}		}

		static MachineBasicBlock *
		EmitLoweredCascadedSelect(MachineInstr &First, MachineInstr &Second,
		MachineBasicBlock *ThisMBB,
		const RISCVSubtarget &Subtarget) {
		// Select_FPRX_ (rs1, rs2, imm, rs4, (Select_FPRX_ rs1, rs2, imm, rs4, rs5)
		// Without this, custom-inserter would have generated:
		//
		// A
		// \| \
		// \| B
		// \| /
		// C
		// \| \
		// \| D
		// \| /
		// E
		//
		// A: X = ...; Y = ...
		// B: empty
		// C: Z = PHI [X, A], [Y, B]
		// D: empty
		// E: PHI [X, C], [Z, D]
		//
		// If we lower both Select_FPRX_ in a single step, we can instead generate:
		//
		// A
		// \| \
		// \| C
		// \| /\|
		// \|/ \|
		// \| \|
		// \| D
		// \| /
		// E
		//
		// A: X = ...; Y = ...
		// D: empty
		// E: PHI [X, A], [X, C], [Y, D]

		const RISCVInstrInfo &TII = *Subtarget.getInstrInfo();
		const DebugLoc &DL = First.getDebugLoc();
		const BasicBlock *LLVM_BB = ThisMBB->getBasicBlock();
		MachineFunction *F = ThisMBB->getParent();
		MachineBasicBlock *FirstMBB = F->CreateMachineBasicBlock(LLVM_BB);
		MachineBasicBlock *SecondMBB = F->CreateMachineBasicBlock(LLVM_BB);
		MachineBasicBlock *SinkMBB = F->CreateMachineBasicBlock(LLVM_BB);
		MachineFunction::iterator It = ++ThisMBB->getIterator();
		F->insert(It, FirstMBB);
		F->insert(It, SecondMBB);
		F->insert(It, SinkMBB);

		// Transfer the remainder of ThisMBB and its successor edges to SinkMBB.
		SinkMBB->splice(SinkMBB->begin(), ThisMBB,
		std::next(MachineBasicBlock::iterator(First)),
		ThisMBB->end());
		SinkMBB->transferSuccessorsAndUpdatePHIs(ThisMBB);

		// Fallthrough block for ThisMBB.
		ThisMBB->addSuccessor(FirstMBB);
		// Fallthrough block for FirstMBB.
		FirstMBB->addSuccessor(SecondMBB);
		ThisMBB->addSuccessor(SinkMBB);
		FirstMBB->addSuccessor(SinkMBB);
		// This is fallthrough.
		SecondMBB->addSuccessor(SinkMBB);

		auto FirstCC = static_cast<RISCVCC::CondCode>(First.getOperand(3).getImm());
		Register FLHS = First.getOperand(1).getReg();
		craig.topperUnsubmitted Not Done Reply Inline Actions Variable names should start with a capital letter craig.topper: Variable names should start with a capital letter
		Register FRHS = First.getOperand(2).getReg();
		// Insert appropriate branch.
		BuildMI(ThisMBB, DL, TII.getBrCond(FirstCC))
		.addReg(FLHS)
		.addReg(FRHS)
		.addMBB(SinkMBB);

		craig.topperUnsubmitted Not Done Reply Inline Actions Why do we need an explicit PseudoBR? Can't we let it fallthrough? craig.topper: Why do we need an explicit PseudoBR? Can't we let it fallthrough?
		Register SLHS = Second.getOperand(1).getReg();
		Register SRHS = Second.getOperand(2).getReg();
		Register Op1Reg4 = First.getOperand(4).getReg();
		Register Op1Reg5 = First.getOperand(5).getReg();

		auto SecondCC = static_cast<RISCVCC::CondCode>(Second.getOperand(3).getImm());
		// Insert appropriate branch.
		BuildMI(FirstMBB, DL, TII.getBrCond(SecondCC))
		.addReg(SLHS)
		.addReg(SRHS)
		.addMBB(SinkMBB);

		Register DestReg = Second.getOperand(0).getReg();
		Register Op2Reg4 = Second.getOperand(4).getReg();
		BuildMI(*SinkMBB, SinkMBB->begin(), DL, TII.get(RISCV::PHI), DestReg)
		.addReg(Op1Reg4)
		.addMBB(ThisMBB)
		.addReg(Op2Reg4)
		.addMBB(FirstMBB)
		.addReg(Op1Reg5)
		.addMBB(SecondMBB);

		// Now remove the Select_FPRX_s.
		First.eraseFromParent();
		Second.eraseFromParent();
		return SinkMBB;
		}

static MachineBasicBlock *emitSelectPseudo(MachineInstr &MI,		static MachineBasicBlock *emitSelectPseudo(MachineInstr &MI,
MachineBasicBlock *BB,		MachineBasicBlock *BB,
const RISCVSubtarget &Subtarget) {		const RISCVSubtarget &Subtarget) {
// To "insert" Select_* instructions, we actually have to insert the triangle		// To "insert" Select_* instructions, we actually have to insert the triangle
// control-flow pattern. The incoming instructions know the destination vreg		// control-flow pattern. The incoming instructions know the destination vreg
// to set, the condition code register to branch on, the true/false values to		// to set, the condition code register to branch on, the true/false values to
// select between, and the condcode to use to select the appropriate branch.		// select between, and the condcode to use to select the appropriate branch.
//		//
Show All 11 Lines	static MachineBasicBlock *emitSelectPseudo(MachineInstr &MI,
// instructions meet some requirements we deem safe:		// instructions meet some requirements we deem safe:
// - They are debug instructions. Otherwise,		// - They are debug instructions. Otherwise,
// - They do not have side-effects, do not access memory and their inputs do		// - They do not have side-effects, do not access memory and their inputs do
// not depend on the results of the select pseudo-instructions.		// not depend on the results of the select pseudo-instructions.
// The TrueV/FalseV operands of the selects cannot depend on the result of		// The TrueV/FalseV operands of the selects cannot depend on the result of
// previous selects in the sequence.		// previous selects in the sequence.
// These conditions could be further relaxed. See the X86 target for a		// These conditions could be further relaxed. See the X86 target for a
// related approach and more information.		// related approach and more information.
		//
		// Select_FPRX_ (rs1, rs2, imm, rs4, (Select_FPRX_ rs1, rs2, imm, rs4, rs5))
		// is checked here and handled by a separate function -
		// EmitLoweredCascadedSelect.
Register LHS = MI.getOperand(1).getReg();		Register LHS = MI.getOperand(1).getReg();
Register RHS = MI.getOperand(2).getReg();		Register RHS = MI.getOperand(2).getReg();
auto CC = static_cast<RISCVCC::CondCode>(MI.getOperand(3).getImm());		auto CC = static_cast<RISCVCC::CondCode>(MI.getOperand(3).getImm());

SmallVector<MachineInstr *, 4> SelectDebugValues;		SmallVector<MachineInstr *, 4> SelectDebugValues;
SmallSet<Register, 4> SelectDests;		SmallSet<Register, 4> SelectDests;
SelectDests.insert(MI.getOperand(0).getReg());		SelectDests.insert(MI.getOperand(0).getReg());

MachineInstr *LastSelectPseudo = &MI;		MachineInstr *LastSelectPseudo = &MI;
		auto Next = next_nodbg(MI.getIterator(), BB->instr_end());
		if (MI.getOpcode() != RISCV::Select_GPR_Using_CC_GPR && Next != BB->end() &&
		craig.topperUnsubmitted Not Done Reply Inline Actions Should this use `next_nodbg` to increment? craig.topper: Should this use `next_nodbg` to increment?
		craig.topperUnsubmitted Not Done Reply Inline Actions I think we also need to make sure the condition for the second select doesn't use the output from the first select. craig.topper: I think we also need to make sure the condition for the second select doesn't use the output…
		liaolucyAuthorUnsubmitted Done Reply Inline Actions I'm still confused. You mean, should I need to add: Next->getOperand(4).getReg() != MI.getOperand(0).getReg() ? But, the assembly of the bug.ll file has not changed. liaolucy: I'm still confused. You mean, should I need to add: Next->getOperand(4).getReg() != MI.
		craig.topperUnsubmitted Not Done Reply Inline Actions I was theorizing another possible bug. If either rs1 or rs2 of the second select is the result of the first select. Meaning one of the compare operands is also the false operand. It’s not safe to do the optimization. Is that already protected? craig.topper: I was theorizing another possible bug. If either rs1 or rs2 of the second select is the result…
		liaolucyAuthorUnsubmitted Done Reply Inline Actions If either rs1 or rs2 of the second select is the result of the first select. This condition does not hold. eg: fpr32 = Select_FPR32_Using_CC_GPR killed %13:gpr, %24:gpr, 1, %19:fpr32, killed %30:fpr32 rs1 and rs2 are gpr, but dst is fpr. liaolucy: If either rs1 or rs2 of the second select is the result of the first select. ======= This…
		liaolucyAuthorUnsubmitted Done Reply Inline Actions If either rs1 or rs2 of the second select is the result of the first select. This condition does not hold. eg: fpr32 = Select_FPR32_Using_CC_GPR killed %13:gpr, %24:gpr, 1, %19:fpr32, killed %30:fpr32 rs1 and rs2 are gpr, but dst is fpr, liaolucy: If either rs1 or rs2 of the second select is the result of the first select. This condition…
		craig.topperUnsubmitted Not Done Reply Inline Actions You're right. I forgot this was FP only. craig.topper: You're right. I forgot this was FP only.
		Next->getOpcode() == MI.getOpcode() &&
		Next->getOperand(5).getReg() == MI.getOperand(0).getReg() &&
		craig.topperUnsubmitted Not Done Reply Inline Actions Shouldn't we be checking `NextMIIt != BB->end()` before checking the opcode of `NextMIIt`? craig.topper: Shouldn't we be checking `NextMIIt != BB->end()` before checking the opcode of `NextMIIt`?
		craig.topperUnsubmitted Not Done Reply Inline Actions I don't understand the new check. Why was it wrong to optimize 1.1 and 2.1 in the failed case? craig.topper: I don't understand the new check. Why was it wrong to optimize 1.1 and 2.1 in the failed case?
		liaolucyAuthorUnsubmitted Done Reply Inline Actions Yesterday, I saw that x86 has this check. // CMOV ((CMOV F, T, cc1), T, cc2) is checked here and handled by a separate // function - EmitLoweredCascadedSelect // This checks for case 2, but only do this if we didn't already find // case 1, as indicated by LastCMOV == MI. if (LastCMOV == &MI && NextMIIt != ThisMBB->end() && NextMIIt->getOpcode() == MI.getOpcode() && NextMIIt->getOperand(2).getReg() == MI.getOperand(2).getReg() && NextMIIt->getOperand(1).getReg() == MI.getOperand(0).getReg() && NextMIIt->getOperand(1).isKill()) { return EmitLoweredCascadedSelect(MI, NextMIIt, ThisMBB); } Try to understand： 1.1 %29:fpr32 = Select_FPR32_Using_CC_GPR killed %7:gpr, %27:gpr, 4, %3:fpr32, killed %25:fpr3 2.1 %30:fpr32 = Select_FPR32_Using_CC_GPR killed %18:gpr, %24:gpr, 1, %16:fpr32, killed %29:fpr32 %7 (a), %27 (b) %18 (c), %24 (d) Eg: a=b , c=d, a=b , c!=d a!=b, c=d, a!=b, c!=d b=d: %7 (a), %27 (b), %18 (c), %24 (b) a=b, c=b a=b=c a=b, c!=b a!= c (optimize) a!=b, c=b a!=b, c!=b a!=b!=c liaolucy:* Yesterday, I saw that x86 has this check. ``` // CMOV ((CMOV F, T, cc1), T, cc2) is…
		craig.topperUnsubmitted Not Done Reply Inline Actions X86 has the CMP done has a separate instruction that writes EFLAGS. The getOperand(2) check for X86 is making sure that the EFLAGS come from the same CMP instruction. RISC-V has does the comparison has part of the branch so it's different. For 1.1 %29:fpr32 = Select_FPR32_Using_CC_GPR killed %7:gpr, %27:gpr, 4, %3:fpr32, killed %25:fpr3 2.1 %30:fpr32 = Select_FPR32_Using_CC_GPR killed %18:gpr, %24:gpr, 1, %16:fpr32, killed %29:fpr32 The original code is (%18 != %24) ? %16 : ((%7 < %27) ? %3 : %25) The transform this patch is trying to do needs to be bb1: BNE %18, %24, bb4 bb2: BLTU %7, %27, bb4 bb3: // fallthrough bb4 phi %16, bb1, %3, bb2, %25 bb3 The condition of the second select needs to be checked first. craig.topper: X86 has the CMP done has a separate instruction that writes EFLAGS. The getOperand(2) check for…
		Next->getOperand(5).isKill()) {
		return EmitLoweredCascadedSelect(MI, *Next, BB, Subtarget);
		}

for (auto E = BB->end(), SequenceMBBI = MachineBasicBlock::iterator(MI);		for (auto E = BB->end(), SequenceMBBI = MachineBasicBlock::iterator(MI);
SequenceMBBI != E; ++SequenceMBBI) {		SequenceMBBI != E; ++SequenceMBBI) {
if (SequenceMBBI->isDebugInstr())		if (SequenceMBBI->isDebugInstr())
continue;		continue;
if (isSelectPseudo(*SequenceMBBI)) {		if (isSelectPseudo(*SequenceMBBI)) {
if (SequenceMBBI->getOperand(1).getReg() != LHS \|\|		if (SequenceMBBI->getOperand(1).getReg() != LHS \|\|
SequenceMBBI->getOperand(2).getReg() != RHS \|\|		SequenceMBBI->getOperand(2).getReg() != RHS \|\|
▲ Show 20 Lines • Show All 2,446 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/select-optimize-multiple.ll

	Show First 20 Lines • Show All 527 Lines • ▼ Show 20 Lines
	; RV64IBT-NEXT: addw a0, a0, a1			; RV64IBT-NEXT: addw a0, a0, a1
	; RV64IBT-NEXT: ret			; RV64IBT-NEXT: ret
	entry:			entry:
	%cond1 = select i1 %a, i32 %c, i32 %d			%cond1 = select i1 %a, i32 %c, i32 %d
	%cond2 = select i1 %b, i32 %e, i32 %f			%cond2 = select i1 %b, i32 %e, i32 %f
	%ret = add i32 %cond1, %cond2			%ret = add i32 %cond1, %cond2
	ret i32 %ret			ret i32 %ret
	}			}

				define float @CascadedSelect(float noundef %a) {
				craig.topperUnsubmitted Not Done Reply Inline Actions Drop dso_local and local_unnamed_addr and `#0` craig.topper: Drop dso_local and local_unnamed_addr and `#0`
				; RV32I-LABEL: CascadedSelect:
				; RV32I: # %bb.0: # %entry
				; RV32I-NEXT: fmv.w.x ft0, a0
				; RV32I-NEXT: fmv.w.x ft1, zero
				; RV32I-NEXT: flt.s a0, ft0, ft1
				; RV32I-NEXT: bnez a0, .LBB8_3
				; RV32I-NEXT: # %bb.1: # %entry
				; RV32I-NEXT: lui a0, %hi(.LCPI8_0)
				; RV32I-NEXT: flw ft1, %lo(.LCPI8_0)(a0)
				; RV32I-NEXT: flt.s a0, ft1, ft0
				; RV32I-NEXT: bnez a0, .LBB8_3
				; RV32I-NEXT: # %bb.2: # %entry
				; RV32I-NEXT: fmv.s ft1, ft0
				; RV32I-NEXT: .LBB8_3: # %entry
				; RV32I-NEXT: fmv.x.w a0, ft1
				; RV32I-NEXT: ret
				;
				; RV32IBT-LABEL: CascadedSelect:
				; RV32IBT: # %bb.0: # %entry
				; RV32IBT-NEXT: fmv.w.x ft0, a0
				; RV32IBT-NEXT: fmv.w.x ft1, zero
				; RV32IBT-NEXT: flt.s a0, ft0, ft1
				; RV32IBT-NEXT: bnez a0, .LBB8_3
				; RV32IBT-NEXT: # %bb.1: # %entry
				; RV32IBT-NEXT: lui a0, %hi(.LCPI8_0)
				; RV32IBT-NEXT: flw ft1, %lo(.LCPI8_0)(a0)
				; RV32IBT-NEXT: flt.s a0, ft1, ft0
				; RV32IBT-NEXT: bnez a0, .LBB8_3
				; RV32IBT-NEXT: # %bb.2: # %entry
				; RV32IBT-NEXT: fmv.s ft1, ft0
				; RV32IBT-NEXT: .LBB8_3: # %entry
				; RV32IBT-NEXT: fmv.x.w a0, ft1
				; RV32IBT-NEXT: ret
				;
				; RV64I-LABEL: CascadedSelect:
				; RV64I: # %bb.0: # %entry
				; RV64I-NEXT: fmv.w.x ft0, a0
				; RV64I-NEXT: fmv.w.x ft1, zero
				; RV64I-NEXT: flt.s a0, ft0, ft1
				; RV64I-NEXT: bnez a0, .LBB8_3
				; RV64I-NEXT: # %bb.1: # %entry
				; RV64I-NEXT: lui a0, %hi(.LCPI8_0)
				; RV64I-NEXT: flw ft1, %lo(.LCPI8_0)(a0)
				; RV64I-NEXT: flt.s a0, ft1, ft0
				; RV64I-NEXT: bnez a0, .LBB8_3
				; RV64I-NEXT: # %bb.2: # %entry
				; RV64I-NEXT: fmv.s ft1, ft0
				; RV64I-NEXT: .LBB8_3: # %entry
				; RV64I-NEXT: fmv.x.w a0, ft1
				; RV64I-NEXT: ret
				;
				; RV64IBT-LABEL: CascadedSelect:
				; RV64IBT: # %bb.0: # %entry
				; RV64IBT-NEXT: fmv.w.x ft0, a0
				; RV64IBT-NEXT: fmv.w.x ft1, zero
				; RV64IBT-NEXT: flt.s a0, ft0, ft1
				; RV64IBT-NEXT: bnez a0, .LBB8_3
				; RV64IBT-NEXT: # %bb.1: # %entry
				; RV64IBT-NEXT: lui a0, %hi(.LCPI8_0)
				; RV64IBT-NEXT: flw ft1, %lo(.LCPI8_0)(a0)
				; RV64IBT-NEXT: flt.s a0, ft1, ft0
				; RV64IBT-NEXT: bnez a0, .LBB8_3
				; RV64IBT-NEXT: # %bb.2: # %entry
				; RV64IBT-NEXT: fmv.s ft1, ft0
				; RV64IBT-NEXT: .LBB8_3: # %entry
				; RV64IBT-NEXT: fmv.x.w a0, ft1
				; RV64IBT-NEXT: ret
				entry:
				%cmp = fcmp ogt float %a, 1.000000e+00
				%cmp1 = fcmp olt float %a, 0.000000e+00
				%.a = select i1 %cmp1, float 0.000000e+00, float %a
				%retval.0 = select i1 %cmp, float 1.000000e+00, float %.a
				ret float %retval.0
				}

This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Optimize 2x SELECT for floating-point typesClosedPublic

Details

Diff Detail

Event Timeline

If either rs1 or rs2 of the second select is the result of the first select.

Revision Contents

Diff 438263

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

If either rs1 or rs2 of the second select is the result of the first select.

llvm/test/CodeGen/RISCV/select-optimize-multiple.ll

[RISCV] Optimize 2x SELECT for floating-point types
ClosedPublic