This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/X86/
-
Target/
-
X86/
-
X86InstrInfo.cpp
-
X86InstrShiftRotate.td
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
rot32.ll
-
rot64.ll

Differential D59391

[X86] Add post-isel pseudos for rotate by immediate using SHLD/SHRD
ClosedPublic

Authored by craig.topper on Mar 14 2019, 3:10 PM.

Download Raw Diff

Details

Reviewers

RKSimon
spatel
efriedma
andreadb

Commits

rG7c9afc35bce9: [X86] Add post-isel pseudos for rotate by immediate using SHLD/SHRD
rL357096: [X86] Add post-isel pseudos for rotate by immediate using SHLD/SHRD

Summary

Haswell CPUs have special support for SHLD/SHRD with the same register for both sources. Such an instruction will go to the rotate/shift unit on port 0 or 6. This gives it 1 cycle latency and 0.5 cycle reciprocal throughput. When the register is not the same, it becomes a 3 cycle operation on port 1. Sandybridge and Ivybridge always have 1 cyc latency and 0.5 cycle reciprocal throughput for any SHLD.

When FastSHLDRotate feature flag is set, we try to use SHLD for rotate by immediate unless BMI2 is enabled. But MachineCopyPropagation can look through a copy and change one of the sources to be different. This will break the hardware optimization.

This patch adds psuedo instruction to hide the second source input until after register allocation and MachineCopyPropagation. I'm not sure if this is the best way to do this or if there's some other way we can make this work.

Fixes PR41055

Diff Detail

Event Timeline

craig.topper created this revision.Mar 14 2019, 3:10 PM

Herald added a subscriber: jdoerfert. · View Herald TranscriptMar 14 2019, 3:10 PM

I've been toying with the idea of using a pseudo for all SHLD/SHRD cases so we can make it easier to select between that and the expanded shift pattern depending on scheduler-model/register-pressure etc. instead of trying to make the decision in DAG with the feature bits. I don't know if this could be a first step towards this? @andreadb Any thoughts?

In D59391#1432572, @RKSimon wrote:

I've been toying with the idea of using a pseudo for all SHLD/SHRD cases so we can make it easier to select between that and the expanded shift pattern depending on scheduler-model/register-pressure etc. instead of trying to make the decision in DAG with the feature bits. I don't know if this could be a first step towards this? @andreadb Any thoughts?

I agree that this could definitely be a first step towards implementing scheduler-driven peephole optimizations.

That being said, this is easier said than done. Coming up with a good (and generic) framework and a decent cost model for doing this is not simple at all... But it is definitely an interesting project: we could start with a simple prototype pass which is then evolve over time. Just an idea.

To start, it would be nice to have a pass that checks if it is profitable to convert a SHLD/SHRD based on the feedback of a simple cost model which internally takes into account potential increases/decreases in register pressure as well as potential perf gains. Something similar in principle to what we already do in passes like TwoAddressInstruction, where heuristics are used to identify cases where it is profitabile to commute/convert instructions (for example: based for example on the number of users of a definition/intervening instructions/etc.).

Ping. Is this ok by itself?

In D59391#1443307, @craig.topper wrote:

Ping. Is this ok by itself?

LGTM - sorry for sidetracking this ticket!

This revision is now accepted and ready to land.Mar 27 2019, 8:51 AM

Closed by commit rL357096: [X86] Add post-isel pseudos for rotate by immediate using SHLD/SHRD (authored by ctopper). · Explain WhyMar 27 2019, 10:30 AM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptMar 27 2019, 10:30 AM

Revision Contents

Path

Size

lib/

Target/

X86/

X86InstrInfo.cpp

22 lines

X86InstrShiftRotate.td

28 lines

test/

CodeGen/

X86/

rot32.ll

16 lines

rot64.ll

16 lines

Diff 190734

lib/Target/X86/X86InstrInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,077 Lines • ▼ Show 20 Lines	if (TRI->getEncodingValue(SrcReg) < 16) {
// Change the destination to a 512-bit register.		// Change the destination to a 512-bit register.
SrcReg = TRI->getMatchingSuperReg(SrcReg, SubIdx, &X86::VR512RegClass);		SrcReg = TRI->getMatchingSuperReg(SrcReg, SubIdx, &X86::VR512RegClass);
MIB->getOperand(X86::AddrNumOperands).setReg(SrcReg);		MIB->getOperand(X86::AddrNumOperands).setReg(SrcReg);
MIB.addImm(0x0); // Append immediate to extract from the lower bits.		MIB.addImm(0x0); // Append immediate to extract from the lower bits.
}		}

return true;		return true;
}		}

		static bool expandSHXDROT(MachineInstrBuilder &MIB, const MCInstrDesc &Desc) {
		MIB->setDesc(Desc);
		int64_t ShiftAmt = MIB->getOperand(2).getImm();
		// Temporarily remove the immediate so we can add another source register.
		MIB->RemoveOperand(2);
		// Add the register. Don't copy the kill flag if there is one.
		MIB.addReg(MIB->getOperand(1).getReg(),
		getUndefRegState(MIB->getOperand(1).isUndef()));
		// Add back the immediate.
		MIB.addImm(ShiftAmt);
		return true;
		}

bool X86InstrInfo::expandPostRAPseudo(MachineInstr &MI) const {		bool X86InstrInfo::expandPostRAPseudo(MachineInstr &MI) const {
bool HasAVX = Subtarget.hasAVX();		bool HasAVX = Subtarget.hasAVX();
MachineInstrBuilder MIB(*MI.getParent()->getParent(), MI);		MachineInstrBuilder MIB(*MI.getParent()->getParent(), MI);
switch (MI.getOpcode()) {		switch (MI.getOpcode()) {
case X86::MOV32r0:		case X86::MOV32r0:
return Expand2AddrUndef(MIB, get(X86::XOR32rr));		return Expand2AddrUndef(MIB, get(X86::XOR32rr));
case X86::MOV32r1:		case X86::MOV32r1:
return expandMOV32r1(MIB, this, /MinusOne=*/ false);		return expandMOV32r1(MIB, this, /MinusOne=*/ false);
▲ Show 20 Lines • Show All 138 Lines • ▼ Show 20 Lines	bool X86InstrInfo::expandPostRAPseudo(MachineInstr &MI) const {
case X86::KSET1D: return Expand2AddrKreg(MIB, get(X86::KXNORDrr), X86::K0);		case X86::KSET1D: return Expand2AddrKreg(MIB, get(X86::KXNORDrr), X86::K0);
case X86::KSET1Q: return Expand2AddrKreg(MIB, get(X86::KXNORQrr), X86::K0);		case X86::KSET1Q: return Expand2AddrKreg(MIB, get(X86::KXNORQrr), X86::K0);
case TargetOpcode::LOAD_STACK_GUARD:		case TargetOpcode::LOAD_STACK_GUARD:
expandLoadStackGuard(MIB, *this);		expandLoadStackGuard(MIB, *this);
return true;		return true;
case X86::XOR64_FP:		case X86::XOR64_FP:
case X86::XOR32_FP:		case X86::XOR32_FP:
return expandXorFP(MIB, *this);		return expandXorFP(MIB, *this);
		case X86::SHLDROT32ri:
		return expandSHXDROT(MIB, get(X86::SHLD32rri8));
		case X86::SHLDROT64ri:
		return expandSHXDROT(MIB, get(X86::SHLD64rri8));
		case X86::SHRDROT32ri:
		return expandSHXDROT(MIB, get(X86::SHRD32rri8));
		case X86::SHRDROT64ri:
		return expandSHXDROT(MIB, get(X86::SHRD64rri8));
}		}
return false;		return false;
}		}

/// Return true for all instructions that only update		/// Return true for all instructions that only update
/// the first 32 or 64-bits of the destination register and leave the rest		/// the first 32 or 64-bits of the destination register and leave the rest
/// unmodified. This can be used to avoid folding loads if the instructions		/// unmodified. This can be used to avoid folding loads if the instructions
/// only update part of the destination register, and the non-updated part is		/// only update part of the destination register, and the non-updated part is
▲ Show 20 Lines • Show All 3,533 Lines • Show Last 20 Lines

lib/Target/X86/X86InstrShiftRotate.td

Show First 20 Lines • Show All 830 Lines • ▼ Show 20 Lines	def : Pat<(store (rotr (loadi16 addr:$dst), (i8 15)), addr:$dst),
(ROL16m1 addr:$dst)>;		(ROL16m1 addr:$dst)>;
def : Pat<(store (rotr (loadi32 addr:$dst), (i8 31)), addr:$dst),		def : Pat<(store (rotr (loadi32 addr:$dst), (i8 31)), addr:$dst),
(ROL32m1 addr:$dst)>;		(ROL32m1 addr:$dst)>;
def : Pat<(store (rotr (loadi64 addr:$dst), (i8 63)), addr:$dst),		def : Pat<(store (rotr (loadi64 addr:$dst), (i8 63)), addr:$dst),
(ROL64m1 addr:$dst)>, Requires<[In64BitMode]>;		(ROL64m1 addr:$dst)>, Requires<[In64BitMode]>;

// Sandy Bridge and newer Intel processors support faster rotates using		// Sandy Bridge and newer Intel processors support faster rotates using
// SHLD to avoid a partial flag update on the normal rotate instructions.		// SHLD to avoid a partial flag update on the normal rotate instructions.
let Predicates = [HasFastSHLDRotate], AddedComplexity = 5 in {		// Use a pseudo so that TwoInstructionPass and register allocation will see
def : Pat<(rotl GR32:$src, (i8 imm:$shamt)),		// this as unary instruction.
(SHLD32rri8 GR32:$src, GR32:$src, imm:$shamt)>;		let Predicates = [HasFastSHLDRotate], AddedComplexity = 5,
def : Pat<(rotl GR64:$src, (i8 imm:$shamt)),		Defs = [EFLAGS], isPseudo = 1, SchedRW = [WriteSHDrri],
(SHLD64rri8 GR64:$src, GR64:$src, imm:$shamt)>;		Constraints = "$src1 = $dst" in {
		def SHLDROT32ri : I<0, Pseudo, (outs GR32:$dst),
def : Pat<(rotr GR32:$src, (i8 imm:$shamt)),		(ins GR32:$src1, u8imm:$shamt), "",
(SHRD32rri8 GR32:$src, GR32:$src, imm:$shamt)>;		[(set GR32:$dst, (rotl GR32:$src1, (i8 imm:$shamt)))]>;
def : Pat<(rotr GR64:$src, (i8 imm:$shamt)),		def SHLDROT64ri : I<0, Pseudo, (outs GR64:$dst),
(SHRD64rri8 GR64:$src, GR64:$src, imm:$shamt)>;		(ins GR64:$src1, u8imm:$shamt), "",
		[(set GR64:$dst, (rotl GR64:$src1, (i8 imm:$shamt)))]>;

		def SHRDROT32ri : I<0, Pseudo, (outs GR32:$dst),
		(ins GR32:$src1, u8imm:$shamt), "",
		[(set GR32:$dst, (rotr GR32:$src1, (i8 imm:$shamt)))]>;
		def SHRDROT64ri : I<0, Pseudo, (outs GR64:$dst),
		(ins GR64:$src1, u8imm:$shamt), "",
		[(set GR64:$dst, (rotr GR64:$src1, (i8 imm:$shamt)))]>;
}		}

def ROT32L2R_imm8 : SDNodeXForm<imm, [{		def ROT32L2R_imm8 : SDNodeXForm<imm, [{
// Convert a ROTL shamt to a ROTR shamt on 32-bit integer.		// Convert a ROTL shamt to a ROTR shamt on 32-bit integer.
return getI8Imm(32 - N->getZExtValue(), SDLoc(N));		return getI8Imm(32 - N->getZExtValue(), SDLoc(N));
}]>;		}]>;

def ROT64L2R_imm8 : SDNodeXForm<imm, [{		def ROT64L2R_imm8 : SDNodeXForm<imm, [{
▲ Show 20 Lines • Show All 149 Lines • Show Last 20 Lines

test/CodeGen/X86/rot32.ll

	Show First 20 Lines • Show All 121 Lines • ▼ Show 20 Lines
	; X64: # %bb.0: # %entry			; X64: # %bb.0: # %entry
	; X64-NEXT: movl %edi, %eax			; X64-NEXT: movl %edi, %eax
	; X64-NEXT: roll $7, %eax			; X64-NEXT: roll $7, %eax
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; SHLD64-LABEL: xfoo:			; SHLD64-LABEL: xfoo:
	; SHLD64: # %bb.0: # %entry			; SHLD64: # %bb.0: # %entry
	; SHLD64-NEXT: movl %edi, %eax			; SHLD64-NEXT: movl %edi, %eax
	; SHLD64-NEXT: shldl $7, %edi, %eax			; SHLD64-NEXT: shldl $7, %eax, %eax
	; SHLD64-NEXT: retq			; SHLD64-NEXT: retq
	;			;
	; BMI264-LABEL: xfoo:			; BMI264-LABEL: xfoo:
	; BMI264: # %bb.0: # %entry			; BMI264: # %bb.0: # %entry
	; BMI264-NEXT: rorxl $25, %edi, %eax			; BMI264-NEXT: rorxl $25, %edi, %eax
	; BMI264-NEXT: retq			; BMI264-NEXT: retq
	entry:			entry:
	%0 = lshr i32 %x, 25			%0 = lshr i32 %x, 25
	▲ Show 20 Lines • Show All 89 Lines • ▼ Show 20 Lines
	; X64: # %bb.0: # %entry			; X64: # %bb.0: # %entry
	; X64-NEXT: movl %edi, %eax			; X64-NEXT: movl %edi, %eax
	; X64-NEXT: roll $25, %eax			; X64-NEXT: roll $25, %eax
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; SHLD64-LABEL: xun:			; SHLD64-LABEL: xun:
	; SHLD64: # %bb.0: # %entry			; SHLD64: # %bb.0: # %entry
	; SHLD64-NEXT: movl %edi, %eax			; SHLD64-NEXT: movl %edi, %eax
	; SHLD64-NEXT: shldl $25, %edi, %eax			; SHLD64-NEXT: shldl $25, %eax, %eax
	; SHLD64-NEXT: retq			; SHLD64-NEXT: retq
	;			;
	; BMI264-LABEL: xun:			; BMI264-LABEL: xun:
	; BMI264: # %bb.0: # %entry			; BMI264: # %bb.0: # %entry
	; BMI264-NEXT: rorxl $7, %edi, %eax			; BMI264-NEXT: rorxl $7, %edi, %eax
	; BMI264-NEXT: retq			; BMI264-NEXT: retq
	entry:			entry:
	%0 = lshr i32 %x, 7			%0 = lshr i32 %x, 7
	▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movl %edi, %eax			; X64-NEXT: movl %edi, %eax
	; X64-NEXT: roll $7, %eax			; X64-NEXT: roll $7, %eax
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; SHLD64-LABEL: fshl:			; SHLD64-LABEL: fshl:
	; SHLD64: # %bb.0:			; SHLD64: # %bb.0:
	; SHLD64-NEXT: movl %edi, %eax			; SHLD64-NEXT: movl %edi, %eax
	; SHLD64-NEXT: shldl $7, %edi, %eax			; SHLD64-NEXT: shldl $7, %eax, %eax
	; SHLD64-NEXT: retq			; SHLD64-NEXT: retq
	;			;
	; BMI264-LABEL: fshl:			; BMI264-LABEL: fshl:
	; BMI264: # %bb.0:			; BMI264: # %bb.0:
	; BMI264-NEXT: rorxl $25, %edi, %eax			; BMI264-NEXT: rorxl $25, %edi, %eax
	; BMI264-NEXT: retq			; BMI264-NEXT: retq
	%f = call i32 @llvm.fshl.i32(i32 %x, i32 %x, i32 7)			%f = call i32 @llvm.fshl.i32(i32 %x, i32 %x, i32 7)
	ret i32 %f			ret i32 %f
	Show All 22 Lines
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movl %edi, %eax			; X64-NEXT: movl %edi, %eax
	; X64-NEXT: roll %eax			; X64-NEXT: roll %eax
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; SHLD64-LABEL: fshl1:			; SHLD64-LABEL: fshl1:
	; SHLD64: # %bb.0:			; SHLD64: # %bb.0:
	; SHLD64-NEXT: movl %edi, %eax			; SHLD64-NEXT: movl %edi, %eax
	; SHLD64-NEXT: shldl $1, %edi, %eax			; SHLD64-NEXT: shldl $1, %eax, %eax
	; SHLD64-NEXT: retq			; SHLD64-NEXT: retq
	;			;
	; BMI264-LABEL: fshl1:			; BMI264-LABEL: fshl1:
	; BMI264: # %bb.0:			; BMI264: # %bb.0:
	; BMI264-NEXT: rorxl $31, %edi, %eax			; BMI264-NEXT: rorxl $31, %edi, %eax
	; BMI264-NEXT: retq			; BMI264-NEXT: retq
	%f = call i32 @llvm.fshl.i32(i32 %x, i32 %x, i32 1)			%f = call i32 @llvm.fshl.i32(i32 %x, i32 %x, i32 1)
	ret i32 %f			ret i32 %f
	Show All 21 Lines
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movl %edi, %eax			; X64-NEXT: movl %edi, %eax
	; X64-NEXT: rorl %eax			; X64-NEXT: rorl %eax
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; SHLD64-LABEL: fshl31:			; SHLD64-LABEL: fshl31:
	; SHLD64: # %bb.0:			; SHLD64: # %bb.0:
	; SHLD64-NEXT: movl %edi, %eax			; SHLD64-NEXT: movl %edi, %eax
	; SHLD64-NEXT: shldl $31, %edi, %eax			; SHLD64-NEXT: shldl $31, %eax, %eax
	; SHLD64-NEXT: retq			; SHLD64-NEXT: retq
	;			;
	; BMI264-LABEL: fshl31:			; BMI264-LABEL: fshl31:
	; BMI264: # %bb.0:			; BMI264: # %bb.0:
	; BMI264-NEXT: rorxl $1, %edi, %eax			; BMI264-NEXT: rorxl $1, %edi, %eax
	; BMI264-NEXT: retq			; BMI264-NEXT: retq
	%f = call i32 @llvm.fshl.i32(i32 %x, i32 %x, i32 31)			%f = call i32 @llvm.fshl.i32(i32 %x, i32 %x, i32 31)
	ret i32 %f			ret i32 %f
	▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movl %edi, %eax			; X64-NEXT: movl %edi, %eax
	; X64-NEXT: rorl $7, %eax			; X64-NEXT: rorl $7, %eax
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; SHLD64-LABEL: fshr:			; SHLD64-LABEL: fshr:
	; SHLD64: # %bb.0:			; SHLD64: # %bb.0:
	; SHLD64-NEXT: movl %edi, %eax			; SHLD64-NEXT: movl %edi, %eax
	; SHLD64-NEXT: shrdl $7, %edi, %eax			; SHLD64-NEXT: shrdl $7, %eax, %eax
	; SHLD64-NEXT: retq			; SHLD64-NEXT: retq
	;			;
	; BMI264-LABEL: fshr:			; BMI264-LABEL: fshr:
	; BMI264: # %bb.0:			; BMI264: # %bb.0:
	; BMI264-NEXT: rorxl $7, %edi, %eax			; BMI264-NEXT: rorxl $7, %edi, %eax
	; BMI264-NEXT: retq			; BMI264-NEXT: retq
	%f = call i32 @llvm.fshr.i32(i32 %x, i32 %x, i32 7)			%f = call i32 @llvm.fshr.i32(i32 %x, i32 %x, i32 7)
	ret i32 %f			ret i32 %f
	Show All 22 Lines
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movl %edi, %eax			; X64-NEXT: movl %edi, %eax
	; X64-NEXT: rorl %eax			; X64-NEXT: rorl %eax
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; SHLD64-LABEL: fshr1:			; SHLD64-LABEL: fshr1:
	; SHLD64: # %bb.0:			; SHLD64: # %bb.0:
	; SHLD64-NEXT: movl %edi, %eax			; SHLD64-NEXT: movl %edi, %eax
	; SHLD64-NEXT: shrdl $1, %edi, %eax			; SHLD64-NEXT: shrdl $1, %eax, %eax
	; SHLD64-NEXT: retq			; SHLD64-NEXT: retq
	;			;
	; BMI264-LABEL: fshr1:			; BMI264-LABEL: fshr1:
	; BMI264: # %bb.0:			; BMI264: # %bb.0:
	; BMI264-NEXT: rorxl $1, %edi, %eax			; BMI264-NEXT: rorxl $1, %edi, %eax
	; BMI264-NEXT: retq			; BMI264-NEXT: retq
	%f = call i32 @llvm.fshr.i32(i32 %x, i32 %x, i32 1)			%f = call i32 @llvm.fshr.i32(i32 %x, i32 %x, i32 1)
	ret i32 %f			ret i32 %f
	Show All 21 Lines
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movl %edi, %eax			; X64-NEXT: movl %edi, %eax
	; X64-NEXT: roll %eax			; X64-NEXT: roll %eax
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; SHLD64-LABEL: fshr31:			; SHLD64-LABEL: fshr31:
	; SHLD64: # %bb.0:			; SHLD64: # %bb.0:
	; SHLD64-NEXT: movl %edi, %eax			; SHLD64-NEXT: movl %edi, %eax
	; SHLD64-NEXT: shrdl $31, %edi, %eax			; SHLD64-NEXT: shrdl $31, %eax, %eax
	; SHLD64-NEXT: retq			; SHLD64-NEXT: retq
	;			;
	; BMI264-LABEL: fshr31:			; BMI264-LABEL: fshr31:
	; BMI264: # %bb.0:			; BMI264: # %bb.0:
	; BMI264-NEXT: rorxl $31, %edi, %eax			; BMI264-NEXT: rorxl $31, %edi, %eax
	; BMI264-NEXT: retq			; BMI264-NEXT: retq
	%f = call i32 @llvm.fshr.i32(i32 %x, i32 %x, i32 31)			%f = call i32 @llvm.fshr.i32(i32 %x, i32 %x, i32 31)
	ret i32 %f			ret i32 %f
	▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

test/CodeGen/X86/rot64.ll

	Show First 20 Lines • Show All 71 Lines • ▼ Show 20 Lines
	; X64: # %bb.0: # %entry			; X64: # %bb.0: # %entry
	; X64-NEXT: movq %rdi, %rax			; X64-NEXT: movq %rdi, %rax
	; X64-NEXT: rolq $7, %rax			; X64-NEXT: rolq $7, %rax
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; SHLD-LABEL: xfoo:			; SHLD-LABEL: xfoo:
	; SHLD: # %bb.0: # %entry			; SHLD: # %bb.0: # %entry
	; SHLD-NEXT: movq %rdi, %rax			; SHLD-NEXT: movq %rdi, %rax
	; SHLD-NEXT: shldq $7, %rdi, %rax			; SHLD-NEXT: shldq $7, %rax, %rax
	; SHLD-NEXT: retq			; SHLD-NEXT: retq
	;			;
	; BMI2-LABEL: xfoo:			; BMI2-LABEL: xfoo:
	; BMI2: # %bb.0: # %entry			; BMI2: # %bb.0: # %entry
	; BMI2-NEXT: rorxq $57, %rdi, %rax			; BMI2-NEXT: rorxq $57, %rdi, %rax
	; BMI2-NEXT: retq			; BMI2-NEXT: retq
	entry:			entry:
	%0 = lshr i64 %x, 57			%0 = lshr i64 %x, 57
	▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
	; X64: # %bb.0: # %entry			; X64: # %bb.0: # %entry
	; X64-NEXT: movq %rdi, %rax			; X64-NEXT: movq %rdi, %rax
	; X64-NEXT: rolq $57, %rax			; X64-NEXT: rolq $57, %rax
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; SHLD-LABEL: xun:			; SHLD-LABEL: xun:
	; SHLD: # %bb.0: # %entry			; SHLD: # %bb.0: # %entry
	; SHLD-NEXT: movq %rdi, %rax			; SHLD-NEXT: movq %rdi, %rax
	; SHLD-NEXT: shldq $57, %rdi, %rax			; SHLD-NEXT: shldq $57, %rax, %rax
	; SHLD-NEXT: retq			; SHLD-NEXT: retq
	;			;
	; BMI2-LABEL: xun:			; BMI2-LABEL: xun:
	; BMI2: # %bb.0: # %entry			; BMI2: # %bb.0: # %entry
	; BMI2-NEXT: rorxq $7, %rdi, %rax			; BMI2-NEXT: rorxq $7, %rdi, %rax
	; BMI2-NEXT: retq			; BMI2-NEXT: retq
	entry:			entry:
	%0 = lshr i64 %x, 7			%0 = lshr i64 %x, 7
	▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movq %rdi, %rax			; X64-NEXT: movq %rdi, %rax
	; X64-NEXT: rolq $7, %rax			; X64-NEXT: rolq $7, %rax
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; SHLD-LABEL: fshl:			; SHLD-LABEL: fshl:
	; SHLD: # %bb.0:			; SHLD: # %bb.0:
	; SHLD-NEXT: movq %rdi, %rax			; SHLD-NEXT: movq %rdi, %rax
	; SHLD-NEXT: shldq $7, %rdi, %rax			; SHLD-NEXT: shldq $7, %rax, %rax
	; SHLD-NEXT: retq			; SHLD-NEXT: retq
	;			;
	; BMI2-LABEL: fshl:			; BMI2-LABEL: fshl:
	; BMI2: # %bb.0:			; BMI2: # %bb.0:
	; BMI2-NEXT: rorxq $57, %rdi, %rax			; BMI2-NEXT: rorxq $57, %rdi, %rax
	; BMI2-NEXT: retq			; BMI2-NEXT: retq
	%f = call i64 @llvm.fshl.i64(i64 %x, i64 %x, i64 7)			%f = call i64 @llvm.fshl.i64(i64 %x, i64 %x, i64 7)
	ret i64 %f			ret i64 %f
	}			}
	declare i64 @llvm.fshl.i64(i64, i64, i64)			declare i64 @llvm.fshl.i64(i64, i64, i64)

	define i64 @fshl1(i64 %x) nounwind {			define i64 @fshl1(i64 %x) nounwind {
	; X64-LABEL: fshl1:			; X64-LABEL: fshl1:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movq %rdi, %rax			; X64-NEXT: movq %rdi, %rax
	; X64-NEXT: rolq %rax			; X64-NEXT: rolq %rax
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; SHLD-LABEL: fshl1:			; SHLD-LABEL: fshl1:
	; SHLD: # %bb.0:			; SHLD: # %bb.0:
	; SHLD-NEXT: movq %rdi, %rax			; SHLD-NEXT: movq %rdi, %rax
	; SHLD-NEXT: shldq $1, %rdi, %rax			; SHLD-NEXT: shldq $1, %rax, %rax
	; SHLD-NEXT: retq			; SHLD-NEXT: retq
	;			;
	; BMI2-LABEL: fshl1:			; BMI2-LABEL: fshl1:
	; BMI2: # %bb.0:			; BMI2: # %bb.0:
	; BMI2-NEXT: rorxq $63, %rdi, %rax			; BMI2-NEXT: rorxq $63, %rdi, %rax
	; BMI2-NEXT: retq			; BMI2-NEXT: retq
	%f = call i64 @llvm.fshl.i64(i64 %x, i64 %x, i64 1)			%f = call i64 @llvm.fshl.i64(i64 %x, i64 %x, i64 1)
	ret i64 %f			ret i64 %f
	}			}

	define i64 @fshl63(i64 %x) nounwind {			define i64 @fshl63(i64 %x) nounwind {
	; X64-LABEL: fshl63:			; X64-LABEL: fshl63:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movq %rdi, %rax			; X64-NEXT: movq %rdi, %rax
	; X64-NEXT: rorq %rax			; X64-NEXT: rorq %rax
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; SHLD-LABEL: fshl63:			; SHLD-LABEL: fshl63:
	; SHLD: # %bb.0:			; SHLD: # %bb.0:
	; SHLD-NEXT: movq %rdi, %rax			; SHLD-NEXT: movq %rdi, %rax
	; SHLD-NEXT: shldq $63, %rdi, %rax			; SHLD-NEXT: shldq $63, %rax, %rax
	; SHLD-NEXT: retq			; SHLD-NEXT: retq
	;			;
	; BMI2-LABEL: fshl63:			; BMI2-LABEL: fshl63:
	; BMI2: # %bb.0:			; BMI2: # %bb.0:
	; BMI2-NEXT: rorxq $1, %rdi, %rax			; BMI2-NEXT: rorxq $1, %rdi, %rax
	; BMI2-NEXT: retq			; BMI2-NEXT: retq
	%f = call i64 @llvm.fshl.i64(i64 %x, i64 %x, i64 63)			%f = call i64 @llvm.fshl.i64(i64 %x, i64 %x, i64 63)
	ret i64 %f			ret i64 %f
	Show All 26 Lines
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movq %rdi, %rax			; X64-NEXT: movq %rdi, %rax
	; X64-NEXT: rorq $7, %rax			; X64-NEXT: rorq $7, %rax
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; SHLD-LABEL: fshr:			; SHLD-LABEL: fshr:
	; SHLD: # %bb.0:			; SHLD: # %bb.0:
	; SHLD-NEXT: movq %rdi, %rax			; SHLD-NEXT: movq %rdi, %rax
	; SHLD-NEXT: shrdq $7, %rdi, %rax			; SHLD-NEXT: shrdq $7, %rax, %rax
	; SHLD-NEXT: retq			; SHLD-NEXT: retq
	;			;
	; BMI2-LABEL: fshr:			; BMI2-LABEL: fshr:
	; BMI2: # %bb.0:			; BMI2: # %bb.0:
	; BMI2-NEXT: rorxq $7, %rdi, %rax			; BMI2-NEXT: rorxq $7, %rdi, %rax
	; BMI2-NEXT: retq			; BMI2-NEXT: retq
	%f = call i64 @llvm.fshr.i64(i64 %x, i64 %x, i64 7)			%f = call i64 @llvm.fshr.i64(i64 %x, i64 %x, i64 7)
	ret i64 %f			ret i64 %f
	}			}
	declare i64 @llvm.fshr.i64(i64, i64, i64)			declare i64 @llvm.fshr.i64(i64, i64, i64)

	define i64 @fshr1(i64 %x) nounwind {			define i64 @fshr1(i64 %x) nounwind {
	; X64-LABEL: fshr1:			; X64-LABEL: fshr1:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movq %rdi, %rax			; X64-NEXT: movq %rdi, %rax
	; X64-NEXT: rorq %rax			; X64-NEXT: rorq %rax
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; SHLD-LABEL: fshr1:			; SHLD-LABEL: fshr1:
	; SHLD: # %bb.0:			; SHLD: # %bb.0:
	; SHLD-NEXT: movq %rdi, %rax			; SHLD-NEXT: movq %rdi, %rax
	; SHLD-NEXT: shrdq $1, %rdi, %rax			; SHLD-NEXT: shrdq $1, %rax, %rax
	; SHLD-NEXT: retq			; SHLD-NEXT: retq
	;			;
	; BMI2-LABEL: fshr1:			; BMI2-LABEL: fshr1:
	; BMI2: # %bb.0:			; BMI2: # %bb.0:
	; BMI2-NEXT: rorxq $1, %rdi, %rax			; BMI2-NEXT: rorxq $1, %rdi, %rax
	; BMI2-NEXT: retq			; BMI2-NEXT: retq
	%f = call i64 @llvm.fshr.i64(i64 %x, i64 %x, i64 1)			%f = call i64 @llvm.fshr.i64(i64 %x, i64 %x, i64 1)
	ret i64 %f			ret i64 %f
	}			}

	define i64 @fshr63(i64 %x) nounwind {			define i64 @fshr63(i64 %x) nounwind {
	; X64-LABEL: fshr63:			; X64-LABEL: fshr63:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movq %rdi, %rax			; X64-NEXT: movq %rdi, %rax
	; X64-NEXT: rolq %rax			; X64-NEXT: rolq %rax
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; SHLD-LABEL: fshr63:			; SHLD-LABEL: fshr63:
	; SHLD: # %bb.0:			; SHLD: # %bb.0:
	; SHLD-NEXT: movq %rdi, %rax			; SHLD-NEXT: movq %rdi, %rax
	; SHLD-NEXT: shrdq $63, %rdi, %rax			; SHLD-NEXT: shrdq $63, %rax, %rax
	; SHLD-NEXT: retq			; SHLD-NEXT: retq
	;			;
	; BMI2-LABEL: fshr63:			; BMI2-LABEL: fshr63:
	; BMI2: # %bb.0:			; BMI2: # %bb.0:
	; BMI2-NEXT: rorxq $63, %rdi, %rax			; BMI2-NEXT: rorxq $63, %rdi, %rax
	; BMI2-NEXT: retq			; BMI2-NEXT: retq
	%f = call i64 @llvm.fshr.i64(i64 %x, i64 %x, i64 63)			%f = call i64 @llvm.fshr.i64(i64 %x, i64 %x, i64 63)
	ret i64 %f			ret i64 %f
	Show All 23 Lines