This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Optimize xor/or with immediate in the zbs extension
ClosedPublic

Authored by benshi001 on May 20 2021, 6:14 PM.

Download Raw Diff

Details

Reviewers

craig.topper
luismarques
MaskRay
asb
jrtc27

Commits

rGbf77317049a8: [RISCV] Optimize xor/or with immediate in the zbs extension

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

benshi001 created this revision.May 20 2021, 6:14 PM

Herald added subscribers: vkmr, frasercrmck, evandro and 22 others. · View Herald TranscriptMay 20 2021, 6:14 PM

benshi001 requested review of this revision.May 20 2021, 6:14 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 20 2021, 6:14 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B105544: Diff 346900.May 20 2021, 6:15 PM

benshi001 added a parent revision: D102854: [RISCV][test] Add new tests of or/xor in the zbs extension.May 20 2021, 6:18 PM

Given how messy this is becoming and the amount of duplicated code for what are very similar patterns, does this not just want to be custom lowering? TableGen is starting to feel like the wrong tool for this.

Also, just a side thought: how does all this interact with using the integers for things like load or store offsets? Do we end up not being able to fold an immediate into the load/store as a result of some of these optimisations and thus lose out overall on large-offset addressing when B is enabled?

In D102893#2772692, @jrtc27 wrote:

Given how messy this is becoming and the amount of duplicated code for what are very similar patterns, does this not just want to be custom lowering? TableGen is starting to feel like the wrong tool for this.

Also, just a side thought: how does all this interact with using the integers for things like load or store offsets? Do we end up not being able to fold an immediate into the load/store as a result of some of these optimisations and thus lose out overall on large-offset addressing when B is enabled?

I’m not sure custom lowering is the right tool either. Making OR/XOR Custom will likely have other effects. Maybe custom isel in RISCVISelDAGToDAG?

These are immediates being used only by an OR/XOR so I don’t think it affects loads/stores folding. Am I missing something?

In D102893#2772744, @craig.topper wrote:

In D102893#2772692, @jrtc27 wrote:

Given how messy this is becoming and the amount of duplicated code for what are very similar patterns, does this not just want to be custom lowering? TableGen is starting to feel like the wrong tool for this.

Also, just a side thought: how does all this interact with using the integers for things like load or store offsets? Do we end up not being able to fold an immediate into the load/store as a result of some of these optimisations and thus lose out overall on large-offset addressing when B is enabled?

I’m not sure custom lowering is the right tool either. Making OR/XOR Custom will likely have other effects. Maybe custom isel in RISCVISelDAGToDAG?

These are immediates being used only by an OR/XOR so I don’t think it affects loads/stores folding. Am I missing something?

How about make the duplicate parts of the two PatLeaf to a function, and then call it in the two PatLeaf? This way would make it more readable and clear.

I do not think ISelDagToDag is good, it even introduces more lines than current form.

benshi001 updated this revision to Diff 346921.May 20 2021, 8:43 PM

I have uploaded a new version, in which the duplicate part is dropped to a function and called in the two ParLeaf.

My comment on the previous patch was about constants like 2051 that can be done with

ORI a0, 3
BSETI a0, 11

I don't think this patch handles that.

llvm/test/CodeGen/RISCV/rv64zbs.ll
1389	This isn't really an improvement.

Harbormaster completed remote builds in B105560: Diff 346921.May 20 2021, 9:26 PM

benshi001 added inline comments.May 20 2021, 9:56 PM

llvm/test/CodeGen/RISCV/rv64zbs.ll
1389	It saves an extra register.

craig.topper added inline comments.May 20 2021, 10:14 PM

llvm/test/CodeGen/RISCV/rv64zbs.ll
1389	With 32 registers, is that an important optimization? Are we also going to implement (BSETI (BSETI (BSETI))) for cases with 3 set bits where none are in the lower 11 bits so we can't use ORI/XORI? It starts getting out of hand.

benshi001 updated this revision to Diff 346929.May 20 2021, 11:36 PM

benshi001 removed a parent revision: D102854: [RISCV][test] Add new tests of or/xor in the zbs extension.

benshi001 marked 2 inline comments as done.

benshi001 added inline comments.

llvm/test/CodeGen/RISCV/rv64zbs.ll
1389	Thanks. I will just handle the simplest case currently.

Harbormaster completed remote builds in B105565: Diff 346929.May 21 2021, 12:25 AM

benshi001 marked an inline comment as done.May 21 2021, 12:30 AM

I have changed my code to cover the simlest case, with the sign bit of simm12 handled in the same way as upper bits.

LGTM

This revision is now accepted and ready to land.May 24 2021, 8:28 PM

Closed by commit rGbf77317049a8: [RISCV] Optimize xor/or with immediate in the zbs extension (authored by benshi001). · Explain WhyMay 24 2021, 11:14 PM

This revision was automatically updated to reflect the committed changes.

benshi001 added a commit: rGbf77317049a8: [RISCV] Optimize xor/or with immediate in the zbs extension.

Revision Contents

Path

Size

llvm/

lib/

Target/

RISCV/

RISCVInstrInfoB.td

37 lines

test/

CodeGen/

RISCV/

rv32zbs.ll

40 lines

rv64zbs.ll

40 lines

Diff 347584

llvm/lib/Target/RISCV/RISCVInstrInfoB.td

Show First 20 Lines • Show All 84 Lines • ▼ Show 20 Lines	def BSETINVMask : ImmLeaf<XLenVT, [{
if (Subtarget->is64Bit())		if (Subtarget->is64Bit())
return !isInt<12>(Imm) && isPowerOf2_64(Imm);		return !isInt<12>(Imm) && isPowerOf2_64(Imm);
return !isInt<12>(Imm) && isPowerOf2_32(Imm);		return !isInt<12>(Imm) && isPowerOf2_32(Imm);
}], BSETINVXForm>;		}], BSETINVXForm>;

// Check if (or r, i) can be optimized to (BSETI (BSETI r, i0), i1),		// Check if (or r, i) can be optimized to (BSETI (BSETI r, i0), i1),
// in which i = (1 << i0) \| (1 << i1).		// in which i = (1 << i0) \| (1 << i1).
def BSETINVTwoBitsMask : PatLeaf<(imm), [{		def BSETINVTwoBitsMask : PatLeaf<(imm), [{
if (!N->hasOneUse())		if (!N->hasOneUse())
return false;		return false;
// The immediate should not be a simm12.		// The immediate should not be a simm12.
if (isInt<12>(N->getSExtValue()))		if (isInt<12>(N->getSExtValue()))
return false;		return false;
// The immediate must have exactly two bits set.		// The immediate must have exactly two bits set.
return countPopulation(N->getZExtValue()) == 2;		return countPopulation(N->getZExtValue()) == 2;
}]>;		}]>;

def BSETINVTwoBitsMaskLow : SDNodeXForm<imm, [{		def BSETINVTwoBitsMaskLow : SDNodeXForm<imm, [{
uint64_t I = N->getZExtValue();		uint64_t I = N->getZExtValue();
return CurDAG->getTargetConstant(countTrailingZeros(I), SDLoc(N),		return CurDAG->getTargetConstant(countTrailingZeros(I), SDLoc(N),
N->getValueType(0));		N->getValueType(0));
}]>;		}]>;

def BSETINVTwoBitsMaskHigh : SDNodeXForm<imm, [{		def BSETINVTwoBitsMaskHigh : SDNodeXForm<imm, [{
uint64_t I = N->getZExtValue();		uint64_t I = N->getZExtValue();
return CurDAG->getTargetConstant(63 - countLeadingZeros(I), SDLoc(N),		return CurDAG->getTargetConstant(63 - countLeadingZeros(I), SDLoc(N),
N->getValueType(0));		N->getValueType(0));
}]>;		}]>;

		// Check if (or r, imm) can be optimized to (BSETI (ORI r, i0), i1),
		// in which imm = i0 \| (1 << i1).
		def BSETINVORIMask : PatLeaf<(imm), [{
		if (!N->hasOneUse())
		return false;
		// The immediate should not be a simm12.
		if (isInt<12>(N->getSExtValue()))
		return false;
		// There should be only one set bit from bit 11 to the top.
		return isPowerOf2_64(N->getZExtValue() & ~0x7ff);
		}]>;

		def BSETINVORIMaskLow : SDNodeXForm<imm, [{
		return CurDAG->getTargetConstant(N->getZExtValue() & 0x7ff,
		SDLoc(N), N->getValueType(0));
		}]>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Instruction class templates		// Instruction class templates
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

// Some of these templates should be moved to RISCVInstrFormats.td once the B		// Some of these templates should be moved to RISCVInstrFormats.td once the B
// extension has been ratified.		// extension has been ratified.

let hasSideEffects = 0, mayLoad = 0, mayStore = 0 in		let hasSideEffects = 0, mayLoad = 0, mayStore = 0 in
▲ Show 20 Lines • Show All 627 Lines • ▼ Show 20 Lines	def : Pat<(and (srl GPR:$rs1, uimmlog2xlen:$shamt), (XLenVT 1)),
(BEXTI GPR:$rs1, uimmlog2xlen:$shamt)>;		(BEXTI GPR:$rs1, uimmlog2xlen:$shamt)>;

def : Pat<(or GPR:$r, BSETINVTwoBitsMask:$i),		def : Pat<(or GPR:$r, BSETINVTwoBitsMask:$i),
(BSETI (BSETI GPR:$r, (BSETINVTwoBitsMaskLow BSETINVTwoBitsMask:$i)),		(BSETI (BSETI GPR:$r, (BSETINVTwoBitsMaskLow BSETINVTwoBitsMask:$i)),
(BSETINVTwoBitsMaskHigh BSETINVTwoBitsMask:$i))>;		(BSETINVTwoBitsMaskHigh BSETINVTwoBitsMask:$i))>;
def : Pat<(xor GPR:$r, BSETINVTwoBitsMask:$i),		def : Pat<(xor GPR:$r, BSETINVTwoBitsMask:$i),
(BINVI (BINVI GPR:$r, (BSETINVTwoBitsMaskLow BSETINVTwoBitsMask:$i)),		(BINVI (BINVI GPR:$r, (BSETINVTwoBitsMaskLow BSETINVTwoBitsMask:$i)),
(BSETINVTwoBitsMaskHigh BSETINVTwoBitsMask:$i))>;		(BSETINVTwoBitsMaskHigh BSETINVTwoBitsMask:$i))>;
		def : Pat<(or GPR:$r, BSETINVORIMask:$i),
		(BSETI (ORI GPR:$r, (BSETINVORIMaskLow BSETINVORIMask:$i)),
		(BSETINVTwoBitsMaskHigh BSETINVORIMask:$i))>;
		def : Pat<(xor GPR:$r, BSETINVORIMask:$i),
		(BINVI (XORI GPR:$r, (BSETINVORIMaskLow BSETINVORIMask:$i)),
		(BSETINVTwoBitsMaskHigh BSETINVORIMask:$i))>;
}		}

// There's no encoding for roli in the the 'B' extension as it can be		// There's no encoding for roli in the the 'B' extension as it can be
// implemented with rori by negating the immediate.		// implemented with rori by negating the immediate.
let Predicates = [HasStdExtZbbOrZbp] in {		let Predicates = [HasStdExtZbbOrZbp] in {
def : PatGprImm<rotr, RORI, uimmlog2xlen>;		def : PatGprImm<rotr, RORI, uimmlog2xlen>;
def : Pat<(rotl GPR:$rs1, uimmlog2xlen:$shamt),		def : Pat<(rotl GPR:$rs1, uimmlog2xlen:$shamt),
(RORI GPR:$rs1, (ImmSubFromXLen uimmlog2xlen:$shamt))>;		(RORI GPR:$rs1, (ImmSubFromXLen uimmlog2xlen:$shamt))>;
▲ Show 20 Lines • Show All 237 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/rv32zbs.ll

	Show First 20 Lines • Show All 737 Lines • ▼ Show 20 Lines
	; RV32I: # %bb.0:			; RV32I: # %bb.0:
	; RV32I-NEXT: lui a1, 1			; RV32I-NEXT: lui a1, 1
	; RV32I-NEXT: addi a1, a1, 3			; RV32I-NEXT: addi a1, a1, 3
	; RV32I-NEXT: xor a0, a0, a1			; RV32I-NEXT: xor a0, a0, a1
	; RV32I-NEXT: ret			; RV32I-NEXT: ret
	;			;
	; RV32IB-LABEL: xor_i32_4099:			; RV32IB-LABEL: xor_i32_4099:
	; RV32IB: # %bb.0:			; RV32IB: # %bb.0:
	; RV32IB-NEXT: lui a1, 1			; RV32IB-NEXT: xori a0, a0, 3
	; RV32IB-NEXT: addi a1, a1, 3			; RV32IB-NEXT: binvi a0, a0, 12
	; RV32IB-NEXT: xor a0, a0, a1
	; RV32IB-NEXT: ret			; RV32IB-NEXT: ret
	;			;
	; RV32IBS-LABEL: xor_i32_4099:			; RV32IBS-LABEL: xor_i32_4099:
	; RV32IBS: # %bb.0:			; RV32IBS: # %bb.0:
	; RV32IBS-NEXT: lui a1, 1			; RV32IBS-NEXT: xori a0, a0, 3
	; RV32IBS-NEXT: addi a1, a1, 3			; RV32IBS-NEXT: binvi a0, a0, 12
	; RV32IBS-NEXT: xor a0, a0, a1
	; RV32IBS-NEXT: ret			; RV32IBS-NEXT: ret
	%xor = xor i32 %a, 4099			%xor = xor i32 %a, 4099
	ret i32 %xor			ret i32 %xor
	}			}

	define i32 @xor_i32_96(i32 %a) nounwind {			define i32 @xor_i32_96(i32 %a) nounwind {
	; RV32I-LABEL: xor_i32_96:			; RV32I-LABEL: xor_i32_96:
	; RV32I: # %bb.0:			; RV32I: # %bb.0:
	Show All 18 Lines
	; RV32I: # %bb.0:			; RV32I: # %bb.0:
	; RV32I-NEXT: lui a1, 16			; RV32I-NEXT: lui a1, 16
	; RV32I-NEXT: addi a1, a1, 1365			; RV32I-NEXT: addi a1, a1, 1365
	; RV32I-NEXT: xor a0, a0, a1			; RV32I-NEXT: xor a0, a0, a1
	; RV32I-NEXT: ret			; RV32I-NEXT: ret
	;			;
	; RV32IB-LABEL: xor_i32_66901:			; RV32IB-LABEL: xor_i32_66901:
	; RV32IB: # %bb.0:			; RV32IB: # %bb.0:
	; RV32IB-NEXT: lui a1, 16			; RV32IB-NEXT: xori a0, a0, 1365
	; RV32IB-NEXT: addi a1, a1, 1365			; RV32IB-NEXT: binvi a0, a0, 16
	; RV32IB-NEXT: xor a0, a0, a1
	; RV32IB-NEXT: ret			; RV32IB-NEXT: ret
	;			;
	; RV32IBS-LABEL: xor_i32_66901:			; RV32IBS-LABEL: xor_i32_66901:
	; RV32IBS: # %bb.0:			; RV32IBS: # %bb.0:
	; RV32IBS-NEXT: lui a1, 16			; RV32IBS-NEXT: xori a0, a0, 1365
	; RV32IBS-NEXT: addi a1, a1, 1365			; RV32IBS-NEXT: binvi a0, a0, 16
	; RV32IBS-NEXT: xor a0, a0, a1
	; RV32IBS-NEXT: ret			; RV32IBS-NEXT: ret
	%xor = xor i32 %a, 66901			%xor = xor i32 %a, 66901
	ret i32 %xor			ret i32 %xor
	}			}

	define i32 @or_i32_4098(i32 %a) nounwind {			define i32 @or_i32_4098(i32 %a) nounwind {
	; RV32I-LABEL: or_i32_4098:			; RV32I-LABEL: or_i32_4098:
	; RV32I: # %bb.0:			; RV32I: # %bb.0:
	Show All 22 Lines
	; RV32I: # %bb.0:			; RV32I: # %bb.0:
	; RV32I-NEXT: lui a1, 1			; RV32I-NEXT: lui a1, 1
	; RV32I-NEXT: addi a1, a1, 3			; RV32I-NEXT: addi a1, a1, 3
	; RV32I-NEXT: or a0, a0, a1			; RV32I-NEXT: or a0, a0, a1
	; RV32I-NEXT: ret			; RV32I-NEXT: ret
	;			;
	; RV32IB-LABEL: or_i32_4099:			; RV32IB-LABEL: or_i32_4099:
	; RV32IB: # %bb.0:			; RV32IB: # %bb.0:
	; RV32IB-NEXT: lui a1, 1			; RV32IB-NEXT: ori a0, a0, 3
	; RV32IB-NEXT: addi a1, a1, 3			; RV32IB-NEXT: bseti a0, a0, 12
	; RV32IB-NEXT: or a0, a0, a1
	; RV32IB-NEXT: ret			; RV32IB-NEXT: ret
	;			;
	; RV32IBS-LABEL: or_i32_4099:			; RV32IBS-LABEL: or_i32_4099:
	; RV32IBS: # %bb.0:			; RV32IBS: # %bb.0:
	; RV32IBS-NEXT: lui a1, 1			; RV32IBS-NEXT: ori a0, a0, 3
	; RV32IBS-NEXT: addi a1, a1, 3			; RV32IBS-NEXT: bseti a0, a0, 12
	; RV32IBS-NEXT: or a0, a0, a1
	; RV32IBS-NEXT: ret			; RV32IBS-NEXT: ret
	%or = or i32 %a, 4099			%or = or i32 %a, 4099
	ret i32 %or			ret i32 %or
	}			}

	define i32 @or_i32_96(i32 %a) nounwind {			define i32 @or_i32_96(i32 %a) nounwind {
	; RV32I-LABEL: or_i32_96:			; RV32I-LABEL: or_i32_96:
	; RV32I: # %bb.0:			; RV32I: # %bb.0:
	Show All 18 Lines
	; RV32I: # %bb.0:			; RV32I: # %bb.0:
	; RV32I-NEXT: lui a1, 16			; RV32I-NEXT: lui a1, 16
	; RV32I-NEXT: addi a1, a1, 1365			; RV32I-NEXT: addi a1, a1, 1365
	; RV32I-NEXT: or a0, a0, a1			; RV32I-NEXT: or a0, a0, a1
	; RV32I-NEXT: ret			; RV32I-NEXT: ret
	;			;
	; RV32IB-LABEL: or_i32_66901:			; RV32IB-LABEL: or_i32_66901:
	; RV32IB: # %bb.0:			; RV32IB: # %bb.0:
	; RV32IB-NEXT: lui a1, 16			; RV32IB-NEXT: ori a0, a0, 1365
	; RV32IB-NEXT: addi a1, a1, 1365			; RV32IB-NEXT: bseti a0, a0, 16
	; RV32IB-NEXT: or a0, a0, a1
	; RV32IB-NEXT: ret			; RV32IB-NEXT: ret
	;			;
	; RV32IBS-LABEL: or_i32_66901:			; RV32IBS-LABEL: or_i32_66901:
	; RV32IBS: # %bb.0:			; RV32IBS: # %bb.0:
	; RV32IBS-NEXT: lui a1, 16			; RV32IBS-NEXT: ori a0, a0, 1365
	; RV32IBS-NEXT: addi a1, a1, 1365			; RV32IBS-NEXT: bseti a0, a0, 16
	; RV32IBS-NEXT: or a0, a0, a1
	; RV32IBS-NEXT: ret			; RV32IBS-NEXT: ret
	%or = or i32 %a, 66901			%or = or i32 %a, 66901
	ret i32 %or			ret i32 %or
	}			}

llvm/test/CodeGen/RISCV/rv64zbs.ll

	Show First 20 Lines • Show All 1,238 Lines • ▼ Show 20 Lines
	; RV64I: # %bb.0:			; RV64I: # %bb.0:
	; RV64I-NEXT: lui a1, 1			; RV64I-NEXT: lui a1, 1
	; RV64I-NEXT: addiw a1, a1, 3			; RV64I-NEXT: addiw a1, a1, 3
	; RV64I-NEXT: xor a0, a0, a1			; RV64I-NEXT: xor a0, a0, a1
	; RV64I-NEXT: ret			; RV64I-NEXT: ret
	;			;
	; RV64IB-LABEL: xor_i64_4099:			; RV64IB-LABEL: xor_i64_4099:
	; RV64IB: # %bb.0:			; RV64IB: # %bb.0:
	; RV64IB-NEXT: lui a1, 1			; RV64IB-NEXT: xori a0, a0, 3
	; RV64IB-NEXT: addiw a1, a1, 3			; RV64IB-NEXT: binvi a0, a0, 12
	; RV64IB-NEXT: xor a0, a0, a1
	; RV64IB-NEXT: ret			; RV64IB-NEXT: ret
	;			;
	; RV64IBS-LABEL: xor_i64_4099:			; RV64IBS-LABEL: xor_i64_4099:
	; RV64IBS: # %bb.0:			; RV64IBS: # %bb.0:
	; RV64IBS-NEXT: lui a1, 1			; RV64IBS-NEXT: xori a0, a0, 3
	; RV64IBS-NEXT: addiw a1, a1, 3			; RV64IBS-NEXT: binvi a0, a0, 12
	; RV64IBS-NEXT: xor a0, a0, a1
	; RV64IBS-NEXT: ret			; RV64IBS-NEXT: ret
	%xor = xor i64 %a, 4099			%xor = xor i64 %a, 4099
	ret i64 %xor			ret i64 %xor
	}			}

	define i64 @xor_i64_96(i64 %a) nounwind {			define i64 @xor_i64_96(i64 %a) nounwind {
	; RV64I-LABEL: xor_i64_96:			; RV64I-LABEL: xor_i64_96:
	; RV64I: # %bb.0:			; RV64I: # %bb.0:
	▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
	; RV64I: # %bb.0:			; RV64I: # %bb.0:
	; RV64I-NEXT: lui a1, 16			; RV64I-NEXT: lui a1, 16
	; RV64I-NEXT: addiw a1, a1, 1365			; RV64I-NEXT: addiw a1, a1, 1365
	; RV64I-NEXT: xor a0, a0, a1			; RV64I-NEXT: xor a0, a0, a1
	; RV64I-NEXT: ret			; RV64I-NEXT: ret
	;			;
	; RV64IB-LABEL: xor_i64_66901:			; RV64IB-LABEL: xor_i64_66901:
	; RV64IB: # %bb.0:			; RV64IB: # %bb.0:
	; RV64IB-NEXT: lui a1, 16			; RV64IB-NEXT: xori a0, a0, 1365
	; RV64IB-NEXT: addiw a1, a1, 1365			; RV64IB-NEXT: binvi a0, a0, 16
	; RV64IB-NEXT: xor a0, a0, a1
	; RV64IB-NEXT: ret			; RV64IB-NEXT: ret
	;			;
	; RV64IBS-LABEL: xor_i64_66901:			; RV64IBS-LABEL: xor_i64_66901:
	; RV64IBS: # %bb.0:			; RV64IBS: # %bb.0:
	; RV64IBS-NEXT: lui a1, 16			; RV64IBS-NEXT: xori a0, a0, 1365
	; RV64IBS-NEXT: addiw a1, a1, 1365			; RV64IBS-NEXT: binvi a0, a0, 16
	; RV64IBS-NEXT: xor a0, a0, a1
	; RV64IBS-NEXT: ret			; RV64IBS-NEXT: ret
	%xor = xor i64 %a, 66901			%xor = xor i64 %a, 66901
	ret i64 %xor			ret i64 %xor
	}			}

	define i64 @or_i64_4099(i64 %a) nounwind {			define i64 @or_i64_4099(i64 %a) nounwind {
	; RV64I-LABEL: or_i64_4099:			; RV64I-LABEL: or_i64_4099:
	; RV64I: # %bb.0:			; RV64I: # %bb.0:
	; RV64I-NEXT: lui a1, 1			; RV64I-NEXT: lui a1, 1
	; RV64I-NEXT: addiw a1, a1, 3			; RV64I-NEXT: addiw a1, a1, 3
	; RV64I-NEXT: or a0, a0, a1			; RV64I-NEXT: or a0, a0, a1
	; RV64I-NEXT: ret			; RV64I-NEXT: ret
	;			;
	; RV64IB-LABEL: or_i64_4099:			; RV64IB-LABEL: or_i64_4099:
	; RV64IB: # %bb.0:			; RV64IB: # %bb.0:
	; RV64IB-NEXT: lui a1, 1			; RV64IB-NEXT: ori a0, a0, 3
	; RV64IB-NEXT: addiw a1, a1, 3			; RV64IB-NEXT: bseti a0, a0, 12
	; RV64IB-NEXT: or a0, a0, a1
	; RV64IB-NEXT: ret			; RV64IB-NEXT: ret
	;			;
	; RV64IBS-LABEL: or_i64_4099:			; RV64IBS-LABEL: or_i64_4099:
	; RV64IBS: # %bb.0:			; RV64IBS: # %bb.0:
	; RV64IBS-NEXT: lui a1, 1			; RV64IBS-NEXT: ori a0, a0, 3
	; RV64IBS-NEXT: addiw a1, a1, 3			; RV64IBS-NEXT: bseti a0, a0, 12
	; RV64IBS-NEXT: or a0, a0, a1
	; RV64IBS-NEXT: ret			; RV64IBS-NEXT: ret
	%or = or i64 %a, 4099			%or = or i64 %a, 4099
	ret i64 %or			ret i64 %or
	}			}

	define i64 @or_i64_96(i64 %a) nounwind {			define i64 @or_i64_96(i64 %a) nounwind {
	; RV64I-LABEL: or_i64_96:			; RV64I-LABEL: or_i64_96:
	; RV64I: # %bb.0:			; RV64I: # %bb.0:
	Show All 18 Lines
	; RV64I: # %bb.0:			; RV64I: # %bb.0:
	; RV64I-NEXT: lui a1, 16			; RV64I-NEXT: lui a1, 16
	; RV64I-NEXT: addiw a1, a1, 1365			; RV64I-NEXT: addiw a1, a1, 1365
	; RV64I-NEXT: or a0, a0, a1			; RV64I-NEXT: or a0, a0, a1
	; RV64I-NEXT: ret			; RV64I-NEXT: ret
	;			;
	; RV64IB-LABEL: or_i64_66901:			; RV64IB-LABEL: or_i64_66901:
	; RV64IB: # %bb.0:			; RV64IB: # %bb.0:
	; RV64IB-NEXT: lui a1, 16			; RV64IB-NEXT: ori a0, a0, 1365
	; RV64IB-NEXT: addiw a1, a1, 1365			; RV64IB-NEXT: bseti a0, a0, 16
	; RV64IB-NEXT: or a0, a0, a1
	; RV64IB-NEXT: ret			; RV64IB-NEXT: ret
	;			;
	; RV64IBS-LABEL: or_i64_66901:			; RV64IBS-LABEL: or_i64_66901:
	; RV64IBS: # %bb.0:			; RV64IBS: # %bb.0:
	; RV64IBS-NEXT: lui a1, 16			; RV64IBS-NEXT: ori a0, a0, 1365
	; RV64IBS-NEXT: addiw a1, a1, 1365			; RV64IBS-NEXT: bseti a0, a0, 16
	; RV64IBS-NEXT: or a0, a0, a1
	; RV64IBS-NEXT: ret			; RV64IBS-NEXT: ret
	%or = or i64 %a, 66901			%or = or i64 %a, 66901
	ret i64 %or			ret i64 %or
	}			}
				craig.topperUnsubmitted Done Reply Inline Actions This isn't really an improvement. craig.topper: This isn't really an improvement.
				benshi001AuthorUnsubmitted Done Reply Inline Actions It saves an extra register. benshi001: It saves an extra register.
				craig.topperUnsubmitted Done Reply Inline Actions With 32 registers, is that an important optimization? Are we also going to implement (BSETI (BSETI (BSETI))) for cases with 3 set bits where none are in the lower 11 bits so we can't use ORI/XORI? It starts getting out of hand. craig.topper: With 32 registers, is that an important optimization? Are we also going to implement (BSETI…
				benshi001AuthorUnsubmitted Done Reply Inline Actions Thanks. I will just handle the simplest case currently. benshi001: Thanks. I will just handle the simplest case currently.