This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
-
DAGCombiner.cpp
-
test/CodeGen/
-
CodeGen/
-
AMDGPU/
-
imm16.ll
-
trunc-combine.ll
-
Hexagon/vect/
-
vect/
-
vect-vaslw.ll
-
X86/
-
load-combine.ll
-
pr32329.ll
1
pr32345.ll
-
pr33290.ll
-
pr34381.ll
-
pr35765.ll
-
scheduler-backtracking.ll

Differential D55866

[DAGCombiner] allow narrowing of add followed by truncate
ClosedPublic

Authored by spatel on Dec 18 2018, 4:38 PM.

Download Raw Diff

Details

Reviewers

arsenm
kparzysz
craig.topper
RKSimon
andreadb

Commits

rG4b537aaf6deb: [DAGCombiner] allow narrowing of add followed by truncate
rL350006: [DAGCombiner] allow narrowing of add followed by truncate

Summary

trunc (add X, C ) --> add (trunc X), C'

If we're throwing away the top bits of an 'add' instruction, do it in the narrow destination type.
This makes the truncate-able opcode list identical to the sibling transform done in IR (in instcombine).

This change used to show regressions for x86, but those are gone after D55494. This gets us closer to deleting the x86 custom function (combineTruncatedArithmetic) that does almost the same thing.

Diff Detail

Event Timeline

spatel created this revision.Dec 18 2018, 4:38 PM

Herald added subscribers: javed.absar, nhaehnle, wdng and 2 others. · View Herald TranscriptDec 18 2018, 4:38 PM

andreadb added a subscriber: andreadb.Dec 18 2018, 5:35 PM

RKSimon added a reviewer: andreadb.Dec 19 2018, 12:40 AM

andreadb added inline comments.Dec 19 2018, 3:24 AM

test/CodeGen/X86/pr32345.ll
122–126	Not sure what is going on here.. We fail to realize that XOR is both commutative and associative. We are essentially doing `(%eax XOR %ecx) XOR %ecx`. That is equivalent to `%eax XOR (%ecx XOR %ecx)`. The second XOR is a zero-idiom. So, it becomes `%eax XOR 0`, which is %eax. That entire computation could be folded away. We should generate just this: movzwl var_27, %ecx movzwl var_22, %eax As a side note: we even fail to realize that the last zero-extending move is redundant! The upper half of EAX is already zero, because it already comes from a zero-extending move. The XOR is done with ECX, which also is zero-extended! So, the upper bits cannot possibly be anything else other than zero. So, the zero-extending move is completely redundant here, because the result is already zero.

From an x86 point of view, this change looks good.

At some point, we should address the poor codegen from test/CodeGen/X86/pr32345.ll. Maybe you could raise a bug for it.

In D55866#1336120, @andreadb wrote:

From an x86 point of view, this change looks good.

At some point, we should address the poor codegen from test/CodeGen/X86/pr32345.ll. Maybe you could raise a bug for it.

I don't think it's worth the effort if we look at the background for this test; the intent was to check for a crash with *unoptimized* code:
https://bugs.llvm.org/show_bug.cgi?id=32345
rL298923
...so the test includes CSE values that the DAG never expects to encounter. I think the optimized RUN was just added for completeness/sanity. The whole thing is folded to 'unreachable' in IR.

In D55866#1336263, @spatel wrote:

In D55866#1336120, @andreadb wrote:

From an x86 point of view, this change looks good.

At some point, we should address the poor codegen from test/CodeGen/X86/pr32345.ll. Maybe you could raise a bug for it.

I don't think it's worth the effort if we look at the background for this test; the intent was to check for a crash with *unoptimized* code:
https://bugs.llvm.org/show_bug.cgi?id=32345
rL298923
...so the test includes CSE values that the DAG never expects to encounter. I think the optimized RUN was just added for completeness/sanity. The whole thing is folded to 'unreachable' in IR.

Fair enough. I didn't check if it was folded by the optimizers at IR level.
Given that the whole sequence is optimized out in IR, then I agree with you: it is not not worthy to look into it.

andreadb accepted this revision.Dec 19 2018, 7:01 AM

This revision is now accepted and ready to land.Dec 19 2018, 7:01 AM

Closed by commit rL350006: [DAGCombiner] allow narrowing of add followed by truncate (authored by spatel). · Explain WhyDec 22 2018, 9:13 AM

This revision was automatically updated to reflect the committed changes.

spatel mentioned this in D57377: [CGP] Add support for sinking operands to their users, if they are free..Jan 30 2019, 8:59 AM

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

3 lines

test/

CodeGen/

AMDGPU/

imm16.ll

6 lines

trunc-combine.ll

4 lines

Hexagon/

vect/

vect-vaslw.ll

2 lines

X86/

8 lines

4 lines

24 lines

2 lines

6 lines

5 lines

scheduler-backtracking.ll

40 lines

Diff 178804

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 9,817 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitTRUNCATE(SDNode *N) {
if (SDValue NewVSel = matchVSelectOpSizesWithSetCC(N))		if (SDValue NewVSel = matchVSelectOpSizesWithSetCC(N))
return NewVSel;		return NewVSel;

// Narrow a suitable binary operation with a non-opaque constant operand by		// Narrow a suitable binary operation with a non-opaque constant operand by
// moving it ahead of the truncate. This is limited to pre-legalization		// moving it ahead of the truncate. This is limited to pre-legalization
// because targets may prefer a wider type during later combines and invert		// because targets may prefer a wider type during later combines and invert
// this transform.		// this transform.
switch (N0.getOpcode()) {		switch (N0.getOpcode()) {
// TODO: Add case for ADD - that will likely require a change in logic here		case ISD::ADD:
// or target-specific changes to avoid regressions.
case ISD::SUB:		case ISD::SUB:
case ISD::MUL:		case ISD::MUL:
case ISD::AND:		case ISD::AND:
case ISD::OR:		case ISD::OR:
case ISD::XOR:		case ISD::XOR:
if (!LegalOperations && N0.hasOneUse() &&		if (!LegalOperations && N0.hasOneUse() &&
(isConstantOrConstantVector(N0.getOperand(0), true) \|\|		(isConstantOrConstantVector(N0.getOperand(0), true) \|\|
isConstantOrConstantVector(N0.getOperand(1), true))) {		isConstantOrConstantVector(N0.getOperand(1), true))) {
▲ Show 20 Lines • Show All 9,326 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/imm16.ll

	Show First 20 Lines • Show All 260 Lines • ▼ Show 20 Lines
	; VI: buffer_store_short [[REG]]			; VI: buffer_store_short [[REG]]
	define amdgpu_kernel void @add_inline_imm_16_f16(half addrspace(1)* %out, half %x) {			define amdgpu_kernel void @add_inline_imm_16_f16(half addrspace(1)* %out, half %x) {
	%y = fadd half %x, 0xH0010			%y = fadd half %x, 0xH0010
	store half %y, half addrspace(1)* %out			store half %y, half addrspace(1)* %out
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}add_inline_imm_neg_1_f16:			; GCN-LABEL: {{^}}add_inline_imm_neg_1_f16:
	; VI: v_add_u32_e32 [[REG:v[0-9]+]], vcc, -1			; VI: v_add_u16_e32 [[REG:v[0-9]+]], -1, [[REG:v[0-9]+]]
	; VI: buffer_store_short [[REG]]			; VI: buffer_store_short [[REG]]
	define amdgpu_kernel void @add_inline_imm_neg_1_f16(half addrspace(1)* %out, i16 addrspace(1)* %in) {			define amdgpu_kernel void @add_inline_imm_neg_1_f16(half addrspace(1)* %out, i16 addrspace(1)* %in) {
	%x = load i16, i16 addrspace(1)* %in			%x = load i16, i16 addrspace(1)* %in
	%y = add i16 %x, -1			%y = add i16 %x, -1
	%ybc = bitcast i16 %y to half			%ybc = bitcast i16 %y to half
	store half %ybc, half addrspace(1)* %out			store half %ybc, half addrspace(1)* %out
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}add_inline_imm_neg_2_f16:			; GCN-LABEL: {{^}}add_inline_imm_neg_2_f16:
	; VI: v_add_u32_e32 [[REG:v[0-9]+]], vcc, 0xfffe			; VI: v_add_u16_e32 [[REG:v[0-9]+]], -2, [[REG:v[0-9]+]]
	; VI: buffer_store_short [[REG]]			; VI: buffer_store_short [[REG]]
	define amdgpu_kernel void @add_inline_imm_neg_2_f16(half addrspace(1)* %out, i16 addrspace(1)* %in) {			define amdgpu_kernel void @add_inline_imm_neg_2_f16(half addrspace(1)* %out, i16 addrspace(1)* %in) {
	%x = load i16, i16 addrspace(1)* %in			%x = load i16, i16 addrspace(1)* %in
	%y = add i16 %x, -2			%y = add i16 %x, -2
	%ybc = bitcast i16 %y to half			%ybc = bitcast i16 %y to half
	store half %ybc, half addrspace(1)* %out			store half %ybc, half addrspace(1)* %out
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}add_inline_imm_neg_16_f16:			; GCN-LABEL: {{^}}add_inline_imm_neg_16_f16:
	; VI: v_add_u32_e32 [[REG:v[0-9]+]], vcc, 0xfff0			; VI: v_add_u16_e32 [[REG:v[0-9]+]], -16, [[REG:v[0-9]+]]
	; VI: buffer_store_short [[REG]]			; VI: buffer_store_short [[REG]]
	define amdgpu_kernel void @add_inline_imm_neg_16_f16(half addrspace(1)* %out, i16 addrspace(1)* %in) {			define amdgpu_kernel void @add_inline_imm_neg_16_f16(half addrspace(1)* %out, i16 addrspace(1)* %in) {
	%x = load i16, i16 addrspace(1)* %in			%x = load i16, i16 addrspace(1)* %in
	%y = add i16 %x, -16			%y = add i16 %x, -16
	%ybc = bitcast i16 %y to half			%ybc = bitcast i16 %y to half
	store half %ybc, half addrspace(1)* %out			store half %ybc, half addrspace(1)* %out
	ret void			ret void
	}			}
	Show All 20 Lines

test/CodeGen/AMDGPU/trunc-combine.ll

Show All 19 Lines	define i32 @trunc_bitcast_i64_lshr_32_i32(i64 %bar) {
%trunc = trunc i64 %srl to i32		%trunc = trunc i64 %srl to i32
ret i32 %trunc		ret i32 %trunc
}		}

; GCN-LABEL: {{^}}trunc_bitcast_v2i32_to_i16:		; GCN-LABEL: {{^}}trunc_bitcast_v2i32_to_i16:
; GCN: _load_dword		; GCN: _load_dword
; GCN-NOT: _load_dword		; GCN-NOT: _load_dword
; GCN-NOT: v_mov_b32		; GCN-NOT: v_mov_b32
; GCN: v_add_u32_e32 v0, vcc, 4, v0		; GCN: v_add_u16_e32 v0, 4, v0
define i16 @trunc_bitcast_v2i32_to_i16(<2 x i32> %bar) {		define i16 @trunc_bitcast_v2i32_to_i16(<2 x i32> %bar) {
%load0 = load i32, i32 addrspace(1)* undef		%load0 = load i32, i32 addrspace(1)* undef
%load1 = load i32, i32 addrspace(1)* null		%load1 = load i32, i32 addrspace(1)* null
%insert.0 = insertelement <2 x i32> undef, i32 %load0, i32 0		%insert.0 = insertelement <2 x i32> undef, i32 %load0, i32 0
%insert.1 = insertelement <2 x i32> %insert.0, i32 99, i32 1		%insert.1 = insertelement <2 x i32> %insert.0, i32 99, i32 1
%bc = bitcast <2 x i32> %insert.1 to i64		%bc = bitcast <2 x i32> %insert.1 to i64
%trunc = trunc i64 %bc to i16		%trunc = trunc i64 %bc to i16
%add = add i16 %trunc, 4		%add = add i16 %trunc, 4
ret i16 %add		ret i16 %add
}		}

; Make sure there's no crash if the source vector type is FP		; Make sure there's no crash if the source vector type is FP
; GCN-LABEL: {{^}}trunc_bitcast_v2f32_to_i16:		; GCN-LABEL: {{^}}trunc_bitcast_v2f32_to_i16:
; GCN: _load_dword		; GCN: _load_dword
; GCN-NOT: _load_dword		; GCN-NOT: _load_dword
; GCN-NOT: v_mov_b32		; GCN-NOT: v_mov_b32
; GCN: v_add_u32_e32 v0, vcc, 4, v0		; GCN: v_add_u16_e32 v0, 4, v0
define i16 @trunc_bitcast_v2f32_to_i16(<2 x float> %bar) {		define i16 @trunc_bitcast_v2f32_to_i16(<2 x float> %bar) {
%load0 = load float, float addrspace(1)* undef		%load0 = load float, float addrspace(1)* undef
%load1 = load float, float addrspace(1)* null		%load1 = load float, float addrspace(1)* null
%insert.0 = insertelement <2 x float> undef, float %load0, i32 0		%insert.0 = insertelement <2 x float> undef, float %load0, i32 0
%insert.1 = insertelement <2 x float> %insert.0, float 4.0, i32 1		%insert.1 = insertelement <2 x float> %insert.0, float 4.0, i32 1
%bc = bitcast <2 x float> %insert.1 to i64		%bc = bitcast <2 x float> %insert.1 to i64
%trunc = trunc i64 %bc to i16		%trunc = trunc i64 %bc to i16
%add = add i16 %trunc, 4		%add = add i16 %trunc, 4
Show All 29 Lines

test/CodeGen/Hexagon/vect/vect-vaslw.ll

	; RUN: llc -march=hexagon < %s \| FileCheck %s			; RUN: llc -march=hexagon < %s \| FileCheck %s
	; CHECK: vaslw			; CHECK: vaslh

	target datalayout = "e-p:32:32:32-i64:64:64-i32:32:32-i16:16:16-i1:32:32-f64:64:64-f32:32:32-v64:64:64-v32:32:32-a0:0-n16:32"			target datalayout = "e-p:32:32:32-i64:64:64-i32:32:32-i16:16:16-i1:32:32-f64:64:64-f32:32:32-v64:64:64-v32:32:32-a0:0-n16:32"
	target triple = "hexagon-unknown-linux-gnu"			target triple = "hexagon-unknown-linux-gnu"

	define void @foo(i16* nocapture %v) nounwind {			define void @foo(i16* nocapture %v) nounwind {
	entry:			entry:
	%p_arrayidx = getelementptr i16, i16* %v, i32 4			%p_arrayidx = getelementptr i16, i16* %v, i32 4
	%vector_ptr = bitcast i16* %p_arrayidx to <4 x i16>*			%vector_ptr = bitcast i16* %p_arrayidx to <4 x i16>*
	Show All 23 Lines

test/CodeGen/X86/load-combine.ll

	Show First 20 Lines • Show All 909 Lines • ▼ Show 20 Lines
	; i8* arg; i32 i;			; i8* arg; i32 i;
	; p = arg + 12;			; p = arg + 12;
	; (i32) p[i] \| ((i32) p[i + 1] << 8) \| ((i32) p[i + 2] << 16) \| ((i32) p[i + 3] << 24)			; (i32) p[i] \| ((i32) p[i + 1] << 8) \| ((i32) p[i + 2] << 16) \| ((i32) p[i + 3] << 24)
	define i32 @load_i32_by_i8_base_offset_index(i8* %arg, i32 %i) {			define i32 @load_i32_by_i8_base_offset_index(i8* %arg, i32 %i) {
	; CHECK-LABEL: load_i32_by_i8_base_offset_index:			; CHECK-LABEL: load_i32_by_i8_base_offset_index:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; CHECK-NEXT: movl 12(%ecx,%eax), %eax			; CHECK-NEXT: movl 12(%eax,%ecx), %eax
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	;			;
	; CHECK64-LABEL: load_i32_by_i8_base_offset_index:			; CHECK64-LABEL: load_i32_by_i8_base_offset_index:
	; CHECK64: # %bb.0:			; CHECK64: # %bb.0:
	; CHECK64-NEXT: movl %esi, %eax			; CHECK64-NEXT: movl %esi, %eax
	; CHECK64-NEXT: movl 12(%rdi,%rax), %eax			; CHECK64-NEXT: movl 12(%rdi,%rax), %eax
	; CHECK64-NEXT: retq			; CHECK64-NEXT: retq
	%tmp = add nuw nsw i32 %i, 3			%tmp = add nuw nsw i32 %i, 3
	Show All 28 Lines
	; i8* arg; i32 i;			; i8* arg; i32 i;
	; p = arg + 12;			; p = arg + 12;
	; (i32) p[i + 1] \| ((i32) p[i + 2] << 8) \| ((i32) p[i + 3] << 16) \| ((i32) p[i + 4] << 24)			; (i32) p[i + 1] \| ((i32) p[i + 2] << 8) \| ((i32) p[i + 3] << 16) \| ((i32) p[i + 4] << 24)
	define i32 @load_i32_by_i8_base_offset_index_2(i8* %arg, i32 %i) {			define i32 @load_i32_by_i8_base_offset_index_2(i8* %arg, i32 %i) {
	; CHECK-LABEL: load_i32_by_i8_base_offset_index_2:			; CHECK-LABEL: load_i32_by_i8_base_offset_index_2:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; CHECK-NEXT: movl 13(%ecx,%eax), %eax			; CHECK-NEXT: movl 13(%eax,%ecx), %eax
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	;			;
	; CHECK64-LABEL: load_i32_by_i8_base_offset_index_2:			; CHECK64-LABEL: load_i32_by_i8_base_offset_index_2:
	; CHECK64: # %bb.0:			; CHECK64: # %bb.0:
	; CHECK64-NEXT: movl %esi, %eax			; CHECK64-NEXT: movl %esi, %eax
	; CHECK64-NEXT: movl 13(%rdi,%rax), %eax			; CHECK64-NEXT: movl 13(%rdi,%rax), %eax
	; CHECK64-NEXT: retq			; CHECK64-NEXT: retq
	%tmp = add nuw nsw i32 %i, 4			%tmp = add nuw nsw i32 %i, 4
	Show All 39 Lines
	; In order to fold the pattern above we need to reassociate the address computation			; In order to fold the pattern above we need to reassociate the address computation
	; first. By the time the address computation is reassociated loads are combined to			; first. By the time the address computation is reassociated loads are combined to
	; to zext and aext loads.			; to zext and aext loads.
	define i32 @load_i32_by_i8_zaext_loads(i8* %arg, i32 %arg1) {			define i32 @load_i32_by_i8_zaext_loads(i8* %arg, i32 %arg1) {
	; CHECK-LABEL: load_i32_by_i8_zaext_loads:			; CHECK-LABEL: load_i32_by_i8_zaext_loads:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; CHECK-NEXT: movl 12(%ecx,%eax), %eax			; CHECK-NEXT: movl 12(%eax,%ecx), %eax
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	;			;
	; CHECK64-LABEL: load_i32_by_i8_zaext_loads:			; CHECK64-LABEL: load_i32_by_i8_zaext_loads:
	; CHECK64: # %bb.0:			; CHECK64: # %bb.0:
	; CHECK64-NEXT: movl %esi, %eax			; CHECK64-NEXT: movl %esi, %eax
	; CHECK64-NEXT: movl 12(%rdi,%rax), %eax			; CHECK64-NEXT: movl 12(%rdi,%rax), %eax
	; CHECK64-NEXT: retq			; CHECK64-NEXT: retq
	%tmp = add nuw nsw i32 %arg1, 3			%tmp = add nuw nsw i32 %arg1, 3
	Show All 39 Lines
	; p3 = arg + i + 3;			; p3 = arg + i + 3;
	;			;
	; (i32) p0[12] \| ((i32) p1[12] << 8) \| ((i32) p2[12] << 16) \| ((i32) p3[12] << 24)			; (i32) p0[12] \| ((i32) p1[12] << 8) \| ((i32) p2[12] << 16) \| ((i32) p3[12] << 24)
	define i32 @load_i32_by_i8_zsext_loads(i8* %arg, i32 %arg1) {			define i32 @load_i32_by_i8_zsext_loads(i8* %arg, i32 %arg1) {
	; CHECK-LABEL: load_i32_by_i8_zsext_loads:			; CHECK-LABEL: load_i32_by_i8_zsext_loads:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; CHECK-NEXT: movl 12(%ecx,%eax), %eax			; CHECK-NEXT: movl 12(%eax,%ecx), %eax
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	;			;
	; CHECK64-LABEL: load_i32_by_i8_zsext_loads:			; CHECK64-LABEL: load_i32_by_i8_zsext_loads:
	; CHECK64: # %bb.0:			; CHECK64: # %bb.0:
	; CHECK64-NEXT: movl %esi, %eax			; CHECK64-NEXT: movl %esi, %eax
	; CHECK64-NEXT: movl 12(%rdi,%rax), %eax			; CHECK64-NEXT: movl 12(%rdi,%rax), %eax
	; CHECK64-NEXT: retq			; CHECK64-NEXT: retq
	%tmp = add nuw nsw i32 %arg1, 3			%tmp = add nuw nsw i32 %arg1, 3
	▲ Show 20 Lines • Show All 229 Lines • Show Last 20 Lines

test/CodeGen/X86/pr32329.ll

	Show All 35 Lines
	; X86-NEXT: imull %eax, %ecx			; X86-NEXT: imull %eax, %ecx
	; X86-NEXT: addl var_24, %ecx			; X86-NEXT: addl var_24, %ecx
	; X86-NEXT: andl $4194303, %edx # imm = 0x3FFFFF			; X86-NEXT: andl $4194303, %edx # imm = 0x3FFFFF
	; X86-NEXT: leal (%edx,%edx), %ebx			; X86-NEXT: leal (%edx,%edx), %ebx
	; X86-NEXT: subl %eax, %ebx			; X86-NEXT: subl %eax, %ebx
	; X86-NEXT: movl %ebx, %edi			; X86-NEXT: movl %ebx, %edi
	; X86-NEXT: subl %esi, %edi			; X86-NEXT: subl %esi, %edi
	; X86-NEXT: imull %edi, %ecx			; X86-NEXT: imull %edi, %ecx
	; X86-NEXT: addl $-1437483407, %ecx # imm = 0xAA51BE71			; X86-NEXT: addb $113, %cl
	; X86-NEXT: movl $9, %esi			; X86-NEXT: movl $9, %esi
	; X86-NEXT: xorl %ebp, %ebp			; X86-NEXT: xorl %ebp, %ebp
	; X86-NEXT: shldl %cl, %esi, %ebp			; X86-NEXT: shldl %cl, %esi, %ebp
	; X86-NEXT: shll %cl, %esi			; X86-NEXT: shll %cl, %esi
	; X86-NEXT: testb $32, %cl			; X86-NEXT: testb $32, %cl
	; X86-NEXT: cmovnel %esi, %ebp			; X86-NEXT: cmovnel %esi, %ebp
	; X86-NEXT: movl $0, %ecx			; X86-NEXT: movl $0, %ecx
	; X86-NEXT: cmovnel %ecx, %esi			; X86-NEXT: cmovnel %ecx, %esi
	Show All 22 Lines
	; X64-NEXT: imull %r9d, %ecx			; X64-NEXT: imull %r9d, %ecx
	; X64-NEXT: addl {{.*}}(%rip), %ecx			; X64-NEXT: addl {{.*}}(%rip), %ecx
	; X64-NEXT: andl $4194303, %eax # imm = 0x3FFFFF			; X64-NEXT: andl $4194303, %eax # imm = 0x3FFFFF
	; X64-NEXT: leal (%rax,%rax), %edi			; X64-NEXT: leal (%rax,%rax), %edi
	; X64-NEXT: subl %r9d, %edi			; X64-NEXT: subl %r9d, %edi
	; X64-NEXT: movl %edi, %esi			; X64-NEXT: movl %edi, %esi
	; X64-NEXT: subl %r8d, %esi			; X64-NEXT: subl %r8d, %esi
	; X64-NEXT: imull %esi, %ecx			; X64-NEXT: imull %esi, %ecx
	; X64-NEXT: addl $-1437483407, %ecx # imm = 0xAA51BE71			; X64-NEXT: addb $113, %cl
	; X64-NEXT: movl $9, %edx			; X64-NEXT: movl $9, %edx
	; X64-NEXT: # kill: def $cl killed $cl killed $ecx			; X64-NEXT: # kill: def $cl killed $cl killed $ecx
	; X64-NEXT: shlq %cl, %rdx			; X64-NEXT: shlq %cl, %rdx
	; X64-NEXT: movq %rdx, {{.*}}(%rip)			; X64-NEXT: movq %rdx, {{.*}}(%rip)
	; X64-NEXT: cmpl %eax, %esi			; X64-NEXT: cmpl %eax, %esi
	; X64-NEXT: setge {{.*}}(%rip)			; X64-NEXT: setge {{.*}}(%rip)
	; X64-NEXT: imull %r9d, %edi			; X64-NEXT: imull %r9d, %edi
	; X64-NEXT: movb %dil, {{.*}}(%rip)			; X64-NEXT: movb %dil, {{.*}}(%rip)
	Show All 33 Lines

test/CodeGen/X86/pr32345.ll

	Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
	; 6860-NEXT: movzwl var_27, %ecx			; 6860-NEXT: movzwl var_27, %ecx
	; 6860-NEXT: movw %cx, %dx			; 6860-NEXT: movw %cx, %dx
	; 6860-NEXT: xorw %dx, %ax			; 6860-NEXT: xorw %dx, %ax
	; 6860-NEXT: # implicit-def: $esi			; 6860-NEXT: # implicit-def: $esi
	; 6860-NEXT: movw %ax, %si			; 6860-NEXT: movw %ax, %si
	; 6860-NEXT: xorl %ecx, %esi			; 6860-NEXT: xorl %ecx, %esi
	; 6860-NEXT: movw %si, %ax			; 6860-NEXT: movw %si, %ax
	; 6860-NEXT: movzwl %ax, %esi			; 6860-NEXT: movzwl %ax, %esi
	; 6860-NEXT: addl $-16610, %ecx # imm = 0xBF1E
	; 6860-NEXT: movb %cl, %bl			; 6860-NEXT: movb %cl, %bl
				; 6860-NEXT: addb $30, %bl
	; 6860-NEXT: xorl %ecx, %ecx			; 6860-NEXT: xorl %ecx, %ecx
	; 6860-NEXT: movl %ecx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill			; 6860-NEXT: movl %ecx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
	; 6860-NEXT: movb %bl, %cl			; 6860-NEXT: movb %bl, %cl
	; 6860-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %edi # 4-byte Reload			; 6860-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %edi # 4-byte Reload
	; 6860-NEXT: shrdl %cl, %edi, %esi			; 6860-NEXT: shrdl %cl, %edi, %esi
	; 6860-NEXT: testb $32, %bl			; 6860-NEXT: testb $32, %bl
	; 6860-NEXT: movl %esi, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill			; 6860-NEXT: movl %esi, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
	; 6860-NEXT: movl %edi, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill			; 6860-NEXT: movl %edi, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
	Show All 11 Lines
	; 6860-NEXT: popl %edi			; 6860-NEXT: popl %edi
	; 6860-NEXT: popl %ebx			; 6860-NEXT: popl %ebx
	; 6860-NEXT: popl %ebp			; 6860-NEXT: popl %ebp
	; 6860-NEXT: .cfi_def_cfa %esp, 4			; 6860-NEXT: .cfi_def_cfa %esp, 4
	; 6860-NEXT: retl			; 6860-NEXT: retl
	;			;
	; X64-LABEL: foo:			; X64-LABEL: foo:
	; X64: # %bb.0: # %bb			; X64: # %bb.0: # %bb
	; X64-NEXT: movzwl {{.*}}(%rip), %eax
	; X64-NEXT: movzwl {{.*}}(%rip), %ecx			; X64-NEXT: movzwl {{.*}}(%rip), %ecx
	; X64-NEXT: movl %ecx, %edx			; X64-NEXT: movzwl {{.*}}(%rip), %eax
	; X64-NEXT: xorl %edx, %edx			; X64-NEXT: xorw %cx, %ax
	; X64-NEXT: xorl %eax, %edx			; X64-NEXT: xorl %ecx, %eax
	; X64-NEXT: movzwl %dx, %eax			; X64-NEXT: movzwl %ax, %eax
	; X64-NEXT: movq %rax, -{{[0-9]+}}(%rsp)			; X64-NEXT: movq %rax, -{{[0-9]+}}(%rsp)
	; X64-NEXT: addl $-16610, %ecx # imm = 0xBF1E			; X64-NEXT: addb $30, %cl
	; X64-NEXT: # kill: def $cl killed $cl killed $ecx			; X64-NEXT: # kill: def $cl killed $cl killed $ecx
	; X64-NEXT: shrq %cl, %rax			; X64-NEXT: shrq %cl, %rax
	; X64-NEXT: movb %al, (%rax)			; X64-NEXT: movb %al, (%rax)
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; 686-LABEL: foo:			; 686-LABEL: foo:
	; 686: # %bb.0: # %bb			; 686: # %bb.0: # %bb
	; 686-NEXT: pushl %ebp			; 686-NEXT: pushl %ebp
	; 686-NEXT: .cfi_def_cfa_offset 8			; 686-NEXT: .cfi_def_cfa_offset 8
	; 686-NEXT: .cfi_offset %ebp, -8			; 686-NEXT: .cfi_offset %ebp, -8
	; 686-NEXT: movl %esp, %ebp			; 686-NEXT: movl %esp, %ebp
	; 686-NEXT: .cfi_def_cfa_register %ebp			; 686-NEXT: .cfi_def_cfa_register %ebp
	; 686-NEXT: andl $-8, %esp			; 686-NEXT: andl $-8, %esp
	; 686-NEXT: subl $8, %esp			; 686-NEXT: subl $8, %esp
	; 686-NEXT: movzwl var_22, %eax
	; 686-NEXT: movzwl var_27, %ecx			; 686-NEXT: movzwl var_27, %ecx
	; 686-NEXT: movl %ecx, %edx			; 686-NEXT: movzwl var_22, %eax
	; 686-NEXT: xorl %ecx, %edx			; 686-NEXT: xorw %cx, %ax
	; 686-NEXT: xorl %eax, %edx			; 686-NEXT: xorl %ecx, %eax
	; 686-NEXT: movzwl %dx, %eax			; 686-NEXT: movzwl %ax, %eax
				andreadbUnsubmitted Not Done Reply Inline Actions Not sure what is going on here.. We fail to realize that XOR is both commutative and associative. We are essentially doing `(%eax XOR %ecx) XOR %ecx`. That is equivalent to `%eax XOR (%ecx XOR %ecx)`. The second XOR is a zero-idiom. So, it becomes `%eax XOR 0`, which is %eax. That entire computation could be folded away. We should generate just this: movzwl var_27, %ecx movzwl var_22, %eax As a side note: we even fail to realize that the last zero-extending move is redundant! The upper half of EAX is already zero, because it already comes from a zero-extending move. The XOR is done with ECX, which also is zero-extended! So, the upper bits cannot possibly be anything else other than zero. So, the zero-extending move is completely redundant here, because the result is already zero. andreadb: Not sure what is going on here.. We fail to realize that XOR is both commutative and…
	; 686-NEXT: movl %eax, (%esp)			; 686-NEXT: movl %eax, (%esp)
	; 686-NEXT: movl $0, {{[0-9]+}}(%esp)			; 686-NEXT: movl $0, {{[0-9]+}}(%esp)
	; 686-NEXT: addl $-16610, %ecx # imm = 0xBF1E			; 686-NEXT: addb $30, %cl
	; 686-NEXT: xorl %edx, %edx			; 686-NEXT: xorl %edx, %edx
	; 686-NEXT: shrdl %cl, %edx, %eax			; 686-NEXT: shrdl %cl, %edx, %eax
	; 686-NEXT: testb $32, %cl			; 686-NEXT: testb $32, %cl
	; 686-NEXT: jne .LBB0_2			; 686-NEXT: jne .LBB0_2
	; 686-NEXT: # %bb.1: # %bb			; 686-NEXT: # %bb.1: # %bb
	; 686-NEXT: movl %eax, %edx			; 686-NEXT: movl %eax, %edx
	; 686-NEXT: .LBB0_2: # %bb			; 686-NEXT: .LBB0_2: # %bb
	; 686-NEXT: movb %dl, (%eax)			; 686-NEXT: movb %dl, (%eax)
	Show All 34 Lines

test/CodeGen/X86/pr33290.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=i686-unknown-unknown \| FileCheck %s --check-prefix=X86			; RUN: llc < %s -mtriple=i686-unknown-unknown \| FileCheck %s --check-prefix=X86
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown \| FileCheck %s --check-prefix=X64			; RUN: llc < %s -mtriple=x86_64-unknown-unknown \| FileCheck %s --check-prefix=X64

	@a = common global i32 0, align 4			@a = common global i32 0, align 4
	@c = common local_unnamed_addr global i8 0, align 1			@c = common local_unnamed_addr global i8 0, align 1
	@b = common local_unnamed_addr global i32* null, align 8			@b = common local_unnamed_addr global i32* null, align 8

	define void @e() {			define void @e() {
	; X86-LABEL: e:			; X86-LABEL: e:
	; X86: # %bb.0: # %entry			; X86: # %bb.0: # %entry
	; X86-NEXT: movl b, %eax			; X86-NEXT: movl b, %eax
	; X86-NEXT: .p2align 4, 0x90			; X86-NEXT: .p2align 4, 0x90
	; X86-NEXT: .LBB0_1: # %for.cond			; X86-NEXT: .LBB0_1: # %for.cond
	; X86-NEXT: # =>This Inner Loop Header: Depth=1			; X86-NEXT: # =>This Inner Loop Header: Depth=1
	; X86-NEXT: movzbl c, %ecx			; X86-NEXT: movzbl c, %ecx
	; X86-NEXT: leal a+2(%ecx), %ecx
	; X86-NEXT: movb $0, c			; X86-NEXT: movb $0, c
				; X86-NEXT: leal a+2(%ecx), %ecx
	; X86-NEXT: movl %ecx, (%eax)			; X86-NEXT: movl %ecx, (%eax)
	; X86-NEXT: jmp .LBB0_1			; X86-NEXT: jmp .LBB0_1
	;			;
	; X64-LABEL: e:			; X64-LABEL: e:
	; X64: # %bb.0: # %entry			; X64: # %bb.0: # %entry
	; X64-NEXT: movq {{.*}}(%rip), %rax			; X64-NEXT: movq {{.*}}(%rip), %rax
	; X64-NEXT: movl $a, %esi			; X64-NEXT: movl $a, %esi
	; X64-NEXT: .p2align 4, 0x90			; X64-NEXT: .p2align 4, 0x90
	Show All 25 Lines

test/CodeGen/X86/pr34381.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	;RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mattr=slow-incdec \| FileCheck %s			;RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mattr=slow-incdec \| FileCheck %s

	@var_21 = external constant i32, align 4			@var_21 = external constant i32, align 4
	@var_29 = external constant i8, align 1			@var_29 = external constant i8, align 1
	@var_390 = external global i32, align 4			@var_390 = external global i32, align 4
	@var_11 = external constant i8, align 1			@var_11 = external constant i8, align 1
	@var_370 = external global i8, align 1			@var_370 = external global i8, align 1

	; Function Attrs: noinline nounwind optnone uwtable			; Function Attrs: noinline nounwind optnone uwtable
	define void @_Z3foov() {			define void @_Z3foov() {
	; CHECK-LABEL: _Z3foov:			; CHECK-LABEL: _Z3foov:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: movsbl {{.*}}(%rip), %eax			; CHECK-NEXT: movsbl {{.*}}(%rip), %eax
	; CHECK-NEXT: negl %eax			; CHECK-NEXT: negl %eax
	; CHECK-NEXT: cmpl %eax, {{.*}}(%rip)
	; CHECK-NEXT: setb %al
	; CHECK-NEXT: xorl %ecx, %ecx			; CHECK-NEXT: xorl %ecx, %ecx
	; CHECK-NEXT: addb $-1, %al			; CHECK-NEXT: cmpl %eax, {{.*}}(%rip)
	; CHECK-NEXT: sete %cl			; CHECK-NEXT: setb %cl
	; CHECK-NEXT: movl %ecx, {{.*}}(%rip)			; CHECK-NEXT: movl %ecx, {{.*}}(%rip)
	; CHECK-NEXT: movb {{.*}}(%rip), %al			; CHECK-NEXT: movb {{.*}}(%rip), %al
	; CHECK-NEXT: movb %al, {{.*}}(%rip)			; CHECK-NEXT: movb %al, {{.*}}(%rip)
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	entry:			entry:
	%0 = load i32, i32* @var_21, align 4			%0 = load i32, i32* @var_21, align 4
	%1 = load i8, i8* @var_29, align 1			%1 = load i8, i8* @var_29, align 1
	%conv = sext i8 %1 to i32			%conv = sext i8 %1 to i32
	Show All 15 Lines

test/CodeGen/X86/pr35765.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=x86_64-unknown-linux-gnu %s -o - \| FileCheck %s			; RUN: llc -mtriple=x86_64-unknown-linux-gnu %s -o - \| FileCheck %s

	@ll = local_unnamed_addr global i64 0, align 8			@ll = local_unnamed_addr global i64 0, align 8
	@x = local_unnamed_addr global i64 2651237805702985558, align 8			@x = local_unnamed_addr global i64 2651237805702985558, align 8
	@s1 = local_unnamed_addr global { i8, i8 } { i8 123, i8 5 }, align 2			@s1 = local_unnamed_addr global { i8, i8 } { i8 123, i8 5 }, align 2
	@s2 = local_unnamed_addr global { i8, i8 } { i8 -122, i8 3 }, align 2			@s2 = local_unnamed_addr global { i8, i8 } { i8 -122, i8 3 }, align 2

	define void @PR35765() {			define void @PR35765() {
	; CHECK-LABEL: PR35765:			; CHECK-LABEL: PR35765:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: movzwl {{.*}}(%rip), %ecx			; CHECK-NEXT: movb {{.*}}(%rip), %cl
	; CHECK-NEXT: addl $-1398, %ecx # imm = 0xFA8A			; CHECK-NEXT: addb $-118, %cl
	; CHECK-NEXT: movl $4, %eax			; CHECK-NEXT: movl $4, %eax
	; CHECK-NEXT: # kill: def $cl killed $cl killed $ecx
	; CHECK-NEXT: shll %cl, %eax			; CHECK-NEXT: shll %cl, %eax
	; CHECK-NEXT: movzwl {{.*}}(%rip), %ecx			; CHECK-NEXT: movzwl {{.*}}(%rip), %ecx
	; CHECK-NEXT: movzwl {{.*}}(%rip), %edx			; CHECK-NEXT: movzwl {{.*}}(%rip), %edx
	; CHECK-NEXT: notl %edx			; CHECK-NEXT: notl %edx
	; CHECK-NEXT: orl $63488, %edx # imm = 0xF800			; CHECK-NEXT: orl $63488, %edx # imm = 0xF800
	; CHECK-NEXT: movzwl %dx, %edx			; CHECK-NEXT: movzwl %dx, %edx
	; CHECK-NEXT: orl %ecx, %edx			; CHECK-NEXT: orl %ecx, %edx
	; CHECK-NEXT: xorl %eax, %edx			; CHECK-NEXT: xorl %eax, %edx
	Show All 22 Lines

test/CodeGen/X86/scheduler-backtracking.ll

	Show All 10 Lines

	define i256 @test1(i256 %a) nounwind {			define i256 @test1(i256 %a) nounwind {
	; ILP-LABEL: test1:			; ILP-LABEL: test1:
	; ILP: # %bb.0:			; ILP: # %bb.0:
	; ILP-NEXT: pushq %r14			; ILP-NEXT: pushq %r14
	; ILP-NEXT: pushq %rbx			; ILP-NEXT: pushq %rbx
	; ILP-NEXT: movq %rdi, %rax			; ILP-NEXT: movq %rdi, %rax
	; ILP-NEXT: xorl %r8d, %r8d			; ILP-NEXT: xorl %r8d, %r8d
	; ILP-NEXT: incl %esi
	; ILP-NEXT: addb %sil, %sil			; ILP-NEXT: addb %sil, %sil
				; ILP-NEXT: addb $2, %sil
	; ILP-NEXT: orb $1, %sil			; ILP-NEXT: orb $1, %sil
	; ILP-NEXT: movl $1, %r10d			; ILP-NEXT: movl $1, %r10d
	; ILP-NEXT: xorl %r14d, %r14d			; ILP-NEXT: xorl %r14d, %r14d
	; ILP-NEXT: movl %esi, %ecx			; ILP-NEXT: movl %esi, %ecx
	; ILP-NEXT: shldq %cl, %r10, %r14			; ILP-NEXT: shldq %cl, %r10, %r14
	; ILP-NEXT: movl $1, %edx			; ILP-NEXT: movl $1, %edx
	; ILP-NEXT: shlq %cl, %rdx			; ILP-NEXT: shlq %cl, %rdx
	; ILP-NEXT: leal -128(%rsi), %r9d
	; ILP-NEXT: movb $-128, %r11b			; ILP-NEXT: movb $-128, %r11b
	; ILP-NEXT: xorl %ebx, %ebx			; ILP-NEXT: subb %sil, %r11b
				; ILP-NEXT: leal -128(%rsi), %r9d
				; ILP-NEXT: xorl %edi, %edi
	; ILP-NEXT: movl %r9d, %ecx			; ILP-NEXT: movl %r9d, %ecx
	; ILP-NEXT: shldq %cl, %r10, %rbx			; ILP-NEXT: shldq %cl, %r10, %rdi
				; ILP-NEXT: movl $1, %ebx
				; ILP-NEXT: shlq %cl, %rbx
				; ILP-NEXT: movl %r11d, %ecx
				; ILP-NEXT: shrdq %cl, %r8, %r10
	; ILP-NEXT: testb $64, %sil			; ILP-NEXT: testb $64, %sil
	; ILP-NEXT: cmovneq %rdx, %r14			; ILP-NEXT: cmovneq %rdx, %r14
	; ILP-NEXT: cmovneq %r8, %rdx			; ILP-NEXT: cmovneq %r8, %rdx
	; ILP-NEXT: movl $1, %edi
	; ILP-NEXT: shlq %cl, %rdi
	; ILP-NEXT: subb %sil, %r11b
	; ILP-NEXT: movl %r11d, %ecx
	; ILP-NEXT: shrdq %cl, %r8, %r10
	; ILP-NEXT: testb $64, %r11b			; ILP-NEXT: testb $64, %r11b
	; ILP-NEXT: cmovneq %r8, %r10			; ILP-NEXT: cmovneq %r8, %r10
	; ILP-NEXT: testb $64, %r9b			; ILP-NEXT: testb $64, %r9b
	; ILP-NEXT: cmovneq %rdi, %rbx			; ILP-NEXT: cmovneq %rbx, %rdi
	; ILP-NEXT: cmovneq %r8, %rdi			; ILP-NEXT: cmovneq %r8, %rbx
	; ILP-NEXT: testb %sil, %sil			; ILP-NEXT: testb %sil, %sil
	; ILP-NEXT: cmovsq %r8, %r14			; ILP-NEXT: cmovsq %r8, %r14
	; ILP-NEXT: cmovsq %r8, %rdx			; ILP-NEXT: cmovsq %r8, %rdx
	; ILP-NEXT: movq %r14, 8(%rax)			; ILP-NEXT: movq %r14, 8(%rax)
	; ILP-NEXT: movq %rdx, (%rax)			; ILP-NEXT: movq %rdx, (%rax)
	; ILP-NEXT: cmovnsq %r8, %rbx			; ILP-NEXT: cmovnsq %r8, %rdi
	; ILP-NEXT: cmoveq %r8, %rbx
	; ILP-NEXT: movq %rbx, 24(%rax)
	; ILP-NEXT: cmovnsq %r10, %rdi
	; ILP-NEXT: cmoveq %r8, %rdi			; ILP-NEXT: cmoveq %r8, %rdi
	; ILP-NEXT: movq %rdi, 16(%rax)			; ILP-NEXT: movq %rdi, 24(%rax)
				; ILP-NEXT: cmovnsq %r10, %rbx
				; ILP-NEXT: cmoveq %r8, %rbx
				; ILP-NEXT: movq %rbx, 16(%rax)
	; ILP-NEXT: popq %rbx			; ILP-NEXT: popq %rbx
	; ILP-NEXT: popq %r14			; ILP-NEXT: popq %r14
	; ILP-NEXT: retq			; ILP-NEXT: retq
	;			;
	; HYBRID-LABEL: test1:			; HYBRID-LABEL: test1:
	; HYBRID: # %bb.0:			; HYBRID: # %bb.0:
	; HYBRID-NEXT: movq %rdi, %rax			; HYBRID-NEXT: movq %rdi, %rax
	; HYBRID-NEXT: incl %esi
	; HYBRID-NEXT: addb %sil, %sil			; HYBRID-NEXT: addb %sil, %sil
				; HYBRID-NEXT: addb $2, %sil
	; HYBRID-NEXT: orb $1, %sil			; HYBRID-NEXT: orb $1, %sil
	; HYBRID-NEXT: movb $-128, %cl			; HYBRID-NEXT: movb $-128, %cl
	; HYBRID-NEXT: subb %sil, %cl			; HYBRID-NEXT: subb %sil, %cl
	; HYBRID-NEXT: xorl %r8d, %r8d			; HYBRID-NEXT: xorl %r8d, %r8d
	; HYBRID-NEXT: movl $1, %r11d			; HYBRID-NEXT: movl $1, %r11d
	; HYBRID-NEXT: movl $1, %r9d			; HYBRID-NEXT: movl $1, %r9d
	; HYBRID-NEXT: shrdq %cl, %r8, %r9			; HYBRID-NEXT: shrdq %cl, %r8, %r9
	; HYBRID-NEXT: testb $64, %cl			; HYBRID-NEXT: testb $64, %cl
	Show All 25 Lines
	; HYBRID-NEXT: cmovnsq %r9, %rdx			; HYBRID-NEXT: cmovnsq %r9, %rdx
	; HYBRID-NEXT: cmoveq %r8, %rdx			; HYBRID-NEXT: cmoveq %r8, %rdx
	; HYBRID-NEXT: movq %rdx, 16(%rax)			; HYBRID-NEXT: movq %rdx, 16(%rax)
	; HYBRID-NEXT: retq			; HYBRID-NEXT: retq
	;			;
	; BURR-LABEL: test1:			; BURR-LABEL: test1:
	; BURR: # %bb.0:			; BURR: # %bb.0:
	; BURR-NEXT: movq %rdi, %rax			; BURR-NEXT: movq %rdi, %rax
	; BURR-NEXT: incl %esi
	; BURR-NEXT: addb %sil, %sil			; BURR-NEXT: addb %sil, %sil
				; BURR-NEXT: addb $2, %sil
	; BURR-NEXT: orb $1, %sil			; BURR-NEXT: orb $1, %sil
	; BURR-NEXT: movb $-128, %cl			; BURR-NEXT: movb $-128, %cl
	; BURR-NEXT: subb %sil, %cl			; BURR-NEXT: subb %sil, %cl
	; BURR-NEXT: xorl %r8d, %r8d			; BURR-NEXT: xorl %r8d, %r8d
	; BURR-NEXT: movl $1, %r11d			; BURR-NEXT: movl $1, %r11d
	; BURR-NEXT: movl $1, %r9d			; BURR-NEXT: movl $1, %r9d
	; BURR-NEXT: shrdq %cl, %r8, %r9			; BURR-NEXT: shrdq %cl, %r8, %r9
	; BURR-NEXT: testb $64, %cl			; BURR-NEXT: testb $64, %cl
	Show All 26 Lines
	; BURR-NEXT: cmoveq %r8, %rdx			; BURR-NEXT: cmoveq %r8, %rdx
	; BURR-NEXT: movq %rdx, 16(%rax)			; BURR-NEXT: movq %rdx, 16(%rax)
	; BURR-NEXT: retq			; BURR-NEXT: retq
	;			;
	; SRC-LABEL: test1:			; SRC-LABEL: test1:
	; SRC: # %bb.0:			; SRC: # %bb.0:
	; SRC-NEXT: pushq %rbx			; SRC-NEXT: pushq %rbx
	; SRC-NEXT: movq %rdi, %rax			; SRC-NEXT: movq %rdi, %rax
	; SRC-NEXT: incl %esi
	; SRC-NEXT: addb %sil, %sil			; SRC-NEXT: addb %sil, %sil
				; SRC-NEXT: addb $2, %sil
	; SRC-NEXT: orb $1, %sil			; SRC-NEXT: orb $1, %sil
	; SRC-NEXT: movb $-128, %cl			; SRC-NEXT: movb $-128, %cl
	; SRC-NEXT: subb %sil, %cl			; SRC-NEXT: subb %sil, %cl
	; SRC-NEXT: xorl %r8d, %r8d			; SRC-NEXT: xorl %r8d, %r8d
	; SRC-NEXT: movl $1, %edi			; SRC-NEXT: movl $1, %edi
	; SRC-NEXT: movl $1, %r10d			; SRC-NEXT: movl $1, %r10d
	; SRC-NEXT: shrdq %cl, %r8, %r10			; SRC-NEXT: shrdq %cl, %r8, %r10
	; SRC-NEXT: testb $64, %cl			; SRC-NEXT: testb $64, %cl
	Show All 29 Lines
	; SRC-NEXT: popq %rbx			; SRC-NEXT: popq %rbx
	; SRC-NEXT: retq			; SRC-NEXT: retq
	;			;
	; LIN-LABEL: test1:			; LIN-LABEL: test1:
	; LIN: # %bb.0:			; LIN: # %bb.0:
	; LIN-NEXT: movq %rdi, %rax			; LIN-NEXT: movq %rdi, %rax
	; LIN-NEXT: xorl %r9d, %r9d			; LIN-NEXT: xorl %r9d, %r9d
	; LIN-NEXT: movl $1, %r8d			; LIN-NEXT: movl $1, %r8d
	; LIN-NEXT: incl %esi
	; LIN-NEXT: addb %sil, %sil			; LIN-NEXT: addb %sil, %sil
				; LIN-NEXT: addb $2, %sil
	; LIN-NEXT: orb $1, %sil			; LIN-NEXT: orb $1, %sil
	; LIN-NEXT: movl $1, %edx			; LIN-NEXT: movl $1, %edx
	; LIN-NEXT: movl %esi, %ecx			; LIN-NEXT: movl %esi, %ecx
	; LIN-NEXT: shlq %cl, %rdx			; LIN-NEXT: shlq %cl, %rdx
	; LIN-NEXT: testb $64, %sil			; LIN-NEXT: testb $64, %sil
	; LIN-NEXT: movq %rdx, %rcx			; LIN-NEXT: movq %rdx, %rcx
	; LIN-NEXT: cmovneq %r9, %rcx			; LIN-NEXT: cmovneq %r9, %rcx
	; LIN-NEXT: testb %sil, %sil			; LIN-NEXT: testb %sil, %sil
	▲ Show 20 Lines • Show All 824 Lines • Show Last 20 Lines