This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/X86/
-
Target/
-
X86/
3
X86InstrCompiler.td
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
6
btc_bts_btr.ll

Differential D48606

[X86] Use bts/btr/btc for single bit set/clear/complement of a variable bit position
ClosedPublic

Authored by craig.topper on Jun 26 2018, 1:50 PM.

Download Raw Diff

Details

Reviewers

lebedev.ri
spatel
RKSimon

Commits

rL335754: [X86] Use bts/btr/btc for single bit set/clear/complement of a variable bit…

Summary

If we are just modifying a single bit at a variable bit position we can use the BT* instructions to make the change instead of shifting a 1(or rotating a -1) and doing a binop. These instruction also ignore the upper bits of their index input so we can also remove an and if one is present on the index.

I'll see if I can spread some multiclass goodness on the td file to reduce the repetition.

Fixes PR37938

Diff Detail

Event Timeline

craig.topper created this revision.Jun 26 2018, 1:50 PM

craig.topper edited the summary of this revision. (Show Details)

No tests for 16-bit case.
Pattern with memory operands unhandled
Pattern with immediates not handled.

(i can follow-up with that, if wanted, mostly have the tests anyway)

I'll add the 16-bit tests.

Memory patterns shouldn't be done. BTR/S/C from memory have strange semantics(they don't ignore unused bits of the index) and are really slow because they recompute the memory address using all bits of the index register.

What do you mean patterns with immediates?

In D48606#1144023, @craig.topper wrote:

I'll add the 16-bit tests.

Memory patterns shouldn't be done. BTR/S/C from memory have strange semantics(they don't ignore unused bits of the index) and are really slow because they recompute the memory address using all bits of the index register.

You still probably want some test to document that?

What do you mean patterns with immediates?

https://godbolt.org/g/GH5p7Z
Unless of course that is also not profitable.

Added the 16-bit test. The 16-bit btr pattern on x86-64 fails to match because the 'and' got promoted to i32 but the rotate didn't. The btc/bts test promoted everything to i32 so we just use the i32 pattern.

The i686 tests fail to select the new pattern because the parameters came off the stack and we still prioritize load folding over these new patterns. In larger sections of real world code where the data didn't directly come from a load we should manage to fold. I can try to add more tests that pad have a preamble of other operations to separate from the load if we want.

For the immediate case we prefer AND/OR/XOR because they have reciprocal throughput of 0.25 on Haswell and Skylake. BTR/BTS/BTC have a reciprocal throughput of 0.5. We had a discussion about what to do for 64-bit and/or/xor when the immediate larger than a 32-bit sign extended value and thus requries a movabsq. We currently use bts/btr/btc for those under optsize see x86-64-bittest-logic.lll

xbolva00 added a subscriber: xbolva00.Jun 26 2018, 2:36 PM

RKSimon added inline comments.Jun 26 2018, 2:38 PM

test/CodeGen/X86/btc_bts_btr.ll
4	Aren't X86/X64 the wrong way around?

RKSimon added inline comments.Jun 26 2018, 2:40 PM

test/CodeGen/X86/btc_bts_btr.ll
4	Sorry - old revision!

Add more tests with explicit loads.

Should I add the RMW memory test cases that we definitely shouldn't use BTS/BTR/BTC for?

In D48606#1144111, @craig.topper wrote:

Should I add the RMW memory test cases that we definitely shouldn't use BTS/BTR/BTC for?

I would, and explicitly spell _dontfold in names so it is obvious that these are intentional.

Add the RMW test cases

lebedev.ri added inline comments.Jun 26 2018, 11:48 PM

lib/Target/X86/X86InstrCompiler.td

1811–1813

It doesn't look like the 16-bit cases matched?

# *** IR Dump After X86 DAG->DAG Instruction Selection ***:
# Machine code for function btr_16: IsSSA, TracksLiveness
Function Live Ins: $edi in %0, $esi in %1

bb.0 (%ir-block.0):
  liveins: $edi, $esi
  %1:gr32 = COPY $esi
  %0:gr32 = COPY $edi
  %2:gr8 = COPY %1.sub_8bit:gr32
  %3:gr16 = MOV16ri -2
  $cl = COPY %2:gr8
  %4:gr16 = ROL16rCL %3:gr16, implicit-def dead $eflags, implicit $cl
  %6:gr32 = IMPLICIT_DEF
  %5:gr32 = INSERT_SUBREG %6:gr32, killed %4:gr16, %subreg.sub_16bit
  %7:gr32 = AND32rr %0:gr32, killed %5:gr32, implicit-def dead $eflags
  %8:gr16 = COPY %7.sub_16bit:gr32
  $ax = COPY %8:gr16
  RET 0, $ax

test/CodeGen/X86/btc_bts_btr.ll

513–517

I wonder if something like

mov (%rdi), %edi
btr %esi, %edi

would be better still than not folding at all?

The 16-bit BTR fails to match because the 'and' got promoted to 32-bit and the rotate didn't. We need to fix the promotion of the rotate. I don't think we should try to pattern match the bit width mismatch. In reality, C type promotion rules make it likely the IR for an "unsigned short" case is already in i32 before we even get to the backend so its probably not a huge issue. So I don't think that should hold up this patch. I'm happy to add a FIXME and/or file a bug.

test/CodeGen/X86/btc_bts_btr.ll
513–517	Probably, but its not exactly easy to do. Tablegen generates the match order by ranking how many SDNodes are covered by the pattern. The regular memory pattern for and/or/xor covers more nodes so gets higher priority. To override the priority you have to add an AddedComplexity line to the pattern. But I worry that significantly bumping the priority of this pattern to override the load pattern may have other effects and require other priorities to be adjusted. I might me being overly cautious, but I'd like to keep the simple approach. gcc doesn't do this when there is a memory op either.

In D48606#1144491, @craig.topper wrote:

The 16-bit BTR fails to match because the 'and' got promoted to 32-bit and the rotate didn't. We need to fix the promotion of the rotate. I don't think we should try to pattern match the bit width mismatch. In reality, C type promotion rules make it likely the IR for an "unsigned short" case is already in i32 before we even get to the backend so its probably not a huge issue. So I don't think that should hold up this patch. I'm happy to add a FIXME and/or file a bug.

Oh right, https://godbolt.org/g/qsd6aF, only the last case isn't folded if i try it locally.

OK, looks good, other than nits, but you may want to wait for a second opinion.
(Please split into two commits - the tests first)

lib/Target/X86/X86InstrCompiler.td
1822	I would love to know how it would be possible to deduplicate the cases with mask and without mask, since that is essentially the same problem i have in D48491 :)
1833	In D48606#1144491, @craig.topper wrote: The 16-bit BTR fails to match because the 'and' got promoted to 32-bit and the rotate didn't. We need to fix the promotion of the rotate. Yeah, i think for now this line should be disabled, and a FIXME added/bug filled.

This revision is now accepted and ready to land.Jun 27 2018, 1:57 AM

RKSimon added inline comments.Jun 27 2018, 5:49 AM

test/CodeGen/X86/btc_bts_btr.ll
135	Separate issue but we should really have movzbl here

craig.topper added inline comments.Jun 27 2018, 8:22 AM

test/CodeGen/X86/btc_bts_btr.ll
135	FixupBWInst only promotes byte loads in loops due to code size increase.

Diffusion mentioned this in rL335753: [X86] Add test cases for D48606..Jun 27 2018, 9:52 AM

craig.topper closed this revision.Jun 28 2018, 10:43 AM

craig.topper added a commit: rL335754: [X86] Use bts/btr/btc for single bit set/clear/complement of a variable bit….

Revision Contents

Path

Size

lib/

Target/

X86/

X86InstrCompiler.td

31 lines

test/

CodeGen/

X86/

btc_bts_btr.ll

959 lines

Diff 152975

lib/Target/X86/X86InstrCompiler.td

Show First 20 Lines • Show All 1,798 Lines • ▼ Show 20 Lines	def : Pat<(shl (loadi32 addr:$src1), (and GR8:$src2, immShift32)),
(i32 (IMPLICIT_DEF)), GR8:$src2, sub_8bit))>;		(i32 (IMPLICIT_DEF)), GR8:$src2, sub_8bit))>;
def : Pat<(shl (loadi64 addr:$src1), (and GR8:$src2, immShift64)),		def : Pat<(shl (loadi64 addr:$src1), (and GR8:$src2, immShift64)),
(SHLX64rm addr:$src1,		(SHLX64rm addr:$src1,
(INSERT_SUBREG		(INSERT_SUBREG
(i64 (IMPLICIT_DEF)), GR8:$src2, sub_8bit))>;		(i64 (IMPLICIT_DEF)), GR8:$src2, sub_8bit))>;
}		}
}		}

		// Use BTR/BTS/BTC for clearing/setting/toggling a bit in a variable location.
		multiclass one_bit_patterns<RegisterClass RC, ValueType VT, Instruction BTR,
		Instruction BTS, Instruction BTC,
		ImmLeaf ImmShift> {
		def : Pat<(and RC:$src1, (rotl -2, GR8:$src2)),
		(BTR RC:$src1,
		(INSERT_SUBREG (VT (IMPLICIT_DEF)), GR8:$src2, sub_8bit))>;
		lebedev.riUnsubmitted Not Done Reply Inline Actions It doesn't look like the 16-bit cases matched? # * IR Dump After X86 DAG->DAG Instruction Selection : # Machine code for function btr_16: IsSSA, TracksLiveness Function Live Ins: $edi in %0, $esi in %1 bb.0 (%ir-block.0): liveins: $edi, $esi %1:gr32 = COPY $esi %0:gr32 = COPY $edi %2:gr8 = COPY %1.sub_8bit:gr32 %3:gr16 = MOV16ri -2 $cl = COPY %2:gr8 %4:gr16 = ROL16rCL %3:gr16, implicit-def dead $eflags, implicit $cl %6:gr32 = IMPLICIT_DEF %5:gr32 = INSERT_SUBREG %6:gr32, killed %4:gr16, %subreg.sub_16bit %7:gr32 = AND32rr %0:gr32, killed %5:gr32, implicit-def dead $eflags %8:gr16 = COPY %7.sub_16bit:gr32 $ax = COPY %8:gr16 RET 0, $ax lebedev.ri:* It doesn't look like the 16-bit cases matched? ``` # *** IR Dump After X86 DAG->DAG Instruction…
		def : Pat<(or RC:$src1, (shl 1, GR8:$src2)),
		(BTS RC:$src1,
		(INSERT_SUBREG (VT (IMPLICIT_DEF)), GR8:$src2, sub_8bit))>;
		def : Pat<(xor RC:$src1, (shl 1, GR8:$src2)),
		(BTC RC:$src1,
		(INSERT_SUBREG (VT (IMPLICIT_DEF)), GR8:$src2, sub_8bit))>;

		// Similar to above, but removing unneeded masking of the shift amount.
		def : Pat<(and RC:$src1, (rotl -2, (and GR8:$src2, ImmShift))),
		lebedev.riUnsubmitted Not Done Reply Inline Actions I would love to know how it would be possible to deduplicate the cases with mask and without mask, since that is essentially the same problem i have in D48491 :) lebedev.ri: I would love to know how it would be possible to deduplicate the cases with mask and without…
		(BTR RC:$src1,
		(INSERT_SUBREG (VT (IMPLICIT_DEF)), GR8:$src2, sub_8bit))>;
		def : Pat<(or RC:$src1, (shl 1, (and GR8:$src2, ImmShift))),
		(BTS RC:$src1,
		(INSERT_SUBREG (VT (IMPLICIT_DEF)), GR8:$src2, sub_8bit))>;
		def : Pat<(xor RC:$src1, (shl 1, (and GR8:$src2, ImmShift))),
		(BTC RC:$src1,
		(INSERT_SUBREG (VT (IMPLICIT_DEF)), GR8:$src2, sub_8bit))>;
		}

		defm : one_bit_patterns<GR16, i16, BTR16rr, BTS16rr, BTC16rr, immShift16>;
		lebedev.riUnsubmitted Not Done Reply Inline Actions In D48606#1144491, @craig.topper wrote: The 16-bit BTR fails to match because the 'and' got promoted to 32-bit and the rotate didn't. We need to fix the promotion of the rotate. Yeah, i think for now this line should be disabled, and a FIXME added/bug filled. lebedev.ri: >>! In D48606#1144491, @craig.topper wrote: > The 16-bit BTR fails to match because the 'and'…
		defm : one_bit_patterns<GR32, i32, BTR32rr, BTS32rr, BTC32rr, immShift32>;
		defm : one_bit_patterns<GR64, i64, BTR64rr, BTS64rr, BTC64rr, immShift64>;


// (anyext (setcc_carry)) -> (setcc_carry)		// (anyext (setcc_carry)) -> (setcc_carry)
def : Pat<(i16 (anyext (i8 (X86setcc_c X86_COND_B, EFLAGS)))),		def : Pat<(i16 (anyext (i8 (X86setcc_c X86_COND_B, EFLAGS)))),
(SETB_C16r)>;		(SETB_C16r)>;
def : Pat<(i32 (anyext (i8 (X86setcc_c X86_COND_B, EFLAGS)))),		def : Pat<(i32 (anyext (i8 (X86setcc_c X86_COND_B, EFLAGS)))),
(SETB_C32r)>;		(SETB_C32r)>;
def : Pat<(i32 (anyext (i16 (X86setcc_c X86_COND_B, EFLAGS)))),		def : Pat<(i32 (anyext (i16 (X86setcc_c X86_COND_B, EFLAGS)))),
(SETB_C32r)>;		(SETB_C32r)>;

▲ Show 20 Lines • Show All 260 Lines • Show Last 20 Lines

test/CodeGen/X86/btc_bts_btr.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=x86_64-pc-linux \| FileCheck %s --check-prefix=X64
				; RUN: llc < %s -mtriple=i386-pc-linux \| FileCheck %s --check-prefix=X86

				RKSimonUnsubmitted Not Done Reply Inline Actions Aren't X86/X64 the wrong way around? RKSimon: Aren't X86/X64 the wrong way around?
				RKSimonUnsubmitted Not Done Reply Inline Actions Sorry - old revision! RKSimon: Sorry - old revision!
				define i16 @btr_16(i16 %x, i16 %n) {
				; X64-LABEL: btr_16:
				; X64: # %bb.0:
				; X64-NEXT: movw $-2, %ax
				; X64-NEXT: movl %esi, %ecx
				; X64-NEXT: rolw %cl, %ax
				; X64-NEXT: andl %edi, %eax
				; X64-NEXT: # kill: def $ax killed $ax killed $eax
				; X64-NEXT: retq
				;
				; X86-LABEL: btr_16:
				; X86: # %bb.0:
				; X86-NEXT: movb {{[0-9]+}}(%esp), %cl
				; X86-NEXT: movw $-2, %ax
				; X86-NEXT: rolw %cl, %ax
				; X86-NEXT: andw {{[0-9]+}}(%esp), %ax
				; X86-NEXT: retl
				%1 = shl i16 1, %n
				%2 = xor i16 %1, -1
				%3 = and i16 %x, %2
				ret i16 %3
				}

				define i16 @bts_16(i16 %x, i16 %n) {
				; X64-LABEL: bts_16:
				; X64: # %bb.0:
				; X64-NEXT: btsl %esi, %edi
				; X64-NEXT: movl %edi, %eax
				; X64-NEXT: retq
				;
				; X86-LABEL: bts_16:
				; X86: # %bb.0:
				; X86-NEXT: movb {{[0-9]+}}(%esp), %cl
				; X86-NEXT: movl $1, %eax
				; X86-NEXT: shll %cl, %eax
				; X86-NEXT: orw {{[0-9]+}}(%esp), %ax
				; X86-NEXT: # kill: def $ax killed $ax killed $eax
				; X86-NEXT: retl
				%1 = shl i16 1, %n
				%2 = or i16 %x, %1
				ret i16 %2
				}

				define i16 @btc_16(i16 %x, i16 %n) {
				; X64-LABEL: btc_16:
				; X64: # %bb.0:
				; X64-NEXT: btcl %esi, %edi
				; X64-NEXT: movl %edi, %eax
				; X64-NEXT: retq
				;
				; X86-LABEL: btc_16:
				; X86: # %bb.0:
				; X86-NEXT: movb {{[0-9]+}}(%esp), %cl
				; X86-NEXT: movl $1, %eax
				; X86-NEXT: shll %cl, %eax
				; X86-NEXT: xorw {{[0-9]+}}(%esp), %ax
				; X86-NEXT: # kill: def $ax killed $ax killed $eax
				; X86-NEXT: retl
				%1 = shl i16 1, %n
				%2 = xor i16 %x, %1
				ret i16 %2
				}

				define i32 @btr_32(i32 %x, i32 %n) {
				; X64-LABEL: btr_32:
				; X64: # %bb.0:
				; X64-NEXT: btrl %esi, %edi
				; X64-NEXT: movl %edi, %eax
				; X64-NEXT: retq
				;
				; X86-LABEL: btr_32:
				; X86: # %bb.0:
				; X86-NEXT: movb {{[0-9]+}}(%esp), %cl
				; X86-NEXT: movl $-2, %eax
				; X86-NEXT: roll %cl, %eax
				; X86-NEXT: andl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: retl
				%1 = shl i32 1, %n
				%2 = xor i32 %1, -1
				%3 = and i32 %x, %2
				ret i32 %3
				}

				define i32 @bts_32(i32 %x, i32 %n) {
				; X64-LABEL: bts_32:
				; X64: # %bb.0:
				; X64-NEXT: btsl %esi, %edi
				; X64-NEXT: movl %edi, %eax
				; X64-NEXT: retq
				;
				; X86-LABEL: bts_32:
				; X86: # %bb.0:
				; X86-NEXT: movb {{[0-9]+}}(%esp), %cl
				; X86-NEXT: movl $1, %eax
				; X86-NEXT: shll %cl, %eax
				; X86-NEXT: orl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: retl
				%1 = shl i32 1, %n
				%2 = or i32 %x, %1
				ret i32 %2
				}

				define i32 @btc_32(i32 %x, i32 %n) {
				; X64-LABEL: btc_32:
				; X64: # %bb.0:
				; X64-NEXT: btcl %esi, %edi
				; X64-NEXT: movl %edi, %eax
				; X64-NEXT: retq
				;
				; X86-LABEL: btc_32:
				; X86: # %bb.0:
				; X86-NEXT: movb {{[0-9]+}}(%esp), %cl
				; X86-NEXT: movl $1, %eax
				; X86-NEXT: shll %cl, %eax
				; X86-NEXT: xorl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: retl
				%1 = shl i32 1, %n
				%2 = xor i32 %x, %1
				ret i32 %2
				}

				define i64 @btr_64(i64 %x, i64 %n) {
				; X64-LABEL: btr_64:
				; X64: # %bb.0:
				; X64-NEXT: btrq %rsi, %rdi
				; X64-NEXT: movq %rdi, %rax
				; X64-NEXT: retq
				;
				; X86-LABEL: btr_64:
				; X86: # %bb.0:
				; X86-NEXT: movb {{[0-9]+}}(%esp), %cl
				RKSimonUnsubmitted Not Done Reply Inline Actions Separate issue but we should really have movzbl here RKSimon: Separate issue but we should really have movzbl here
				craig.topperAuthorUnsubmitted Not Done Reply Inline Actions FixupBWInst only promotes byte loads in loops due to code size increase. craig.topper: FixupBWInst only promotes byte loads in loops due to code size increase.
				; X86-NEXT: movl $1, %eax
				; X86-NEXT: xorl %edx, %edx
				; X86-NEXT: shldl %cl, %eax, %edx
				; X86-NEXT: shll %cl, %eax
				; X86-NEXT: testb $32, %cl
				; X86-NEXT: je .LBB6_2
				; X86-NEXT: # %bb.1:
				; X86-NEXT: movl %eax, %edx
				; X86-NEXT: xorl %eax, %eax
				; X86-NEXT: .LBB6_2:
				; X86-NEXT: notl %edx
				; X86-NEXT: notl %eax
				; X86-NEXT: andl {{[0-9]+}}(%esp), %edx
				; X86-NEXT: andl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: retl
				%1 = shl i64 1, %n
				%2 = xor i64 %1, -1
				%3 = and i64 %x, %2
				ret i64 %3
				}

				define i64 @bts_64(i64 %x, i64 %n) {
				; X64-LABEL: bts_64:
				; X64: # %bb.0:
				; X64-NEXT: btsq %rsi, %rdi
				; X64-NEXT: movq %rdi, %rax
				; X64-NEXT: retq
				;
				; X86-LABEL: bts_64:
				; X86: # %bb.0:
				; X86-NEXT: movb {{[0-9]+}}(%esp), %cl
				; X86-NEXT: movl $1, %eax
				; X86-NEXT: xorl %edx, %edx
				; X86-NEXT: shldl %cl, %eax, %edx
				; X86-NEXT: shll %cl, %eax
				; X86-NEXT: testb $32, %cl
				; X86-NEXT: je .LBB7_2
				; X86-NEXT: # %bb.1:
				; X86-NEXT: movl %eax, %edx
				; X86-NEXT: xorl %eax, %eax
				; X86-NEXT: .LBB7_2:
				; X86-NEXT: orl {{[0-9]+}}(%esp), %edx
				; X86-NEXT: orl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: retl
				%1 = shl i64 1, %n
				%2 = or i64 %x, %1
				ret i64 %2
				}

				define i64 @btc_64(i64 %x, i64 %n) {
				; X64-LABEL: btc_64:
				; X64: # %bb.0:
				; X64-NEXT: btcq %rsi, %rdi
				; X64-NEXT: movq %rdi, %rax
				; X64-NEXT: retq
				;
				; X86-LABEL: btc_64:
				; X86: # %bb.0:
				; X86-NEXT: movb {{[0-9]+}}(%esp), %cl
				; X86-NEXT: movl $1, %eax
				; X86-NEXT: xorl %edx, %edx
				; X86-NEXT: shldl %cl, %eax, %edx
				; X86-NEXT: shll %cl, %eax
				; X86-NEXT: testb $32, %cl
				; X86-NEXT: je .LBB8_2
				; X86-NEXT: # %bb.1:
				; X86-NEXT: movl %eax, %edx
				; X86-NEXT: xorl %eax, %eax
				; X86-NEXT: .LBB8_2:
				; X86-NEXT: xorl {{[0-9]+}}(%esp), %edx
				; X86-NEXT: xorl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: retl
				%1 = shl i64 1, %n
				%2 = xor i64 %x, %1
				ret i64 %2
				}

				define i16 @btr_16_mask(i16 %x, i16 %n) {
				; X64-LABEL: btr_16_mask:
				; X64: # %bb.0:
				; X64-NEXT: movw $-2, %ax
				; X64-NEXT: movl %esi, %ecx
				; X64-NEXT: rolw %cl, %ax
				; X64-NEXT: andl %edi, %eax
				; X64-NEXT: # kill: def $ax killed $ax killed $eax
				; X64-NEXT: retq
				;
				; X86-LABEL: btr_16_mask:
				; X86: # %bb.0:
				; X86-NEXT: movb {{[0-9]+}}(%esp), %cl
				; X86-NEXT: movw $-2, %ax
				; X86-NEXT: rolw %cl, %ax
				; X86-NEXT: andw {{[0-9]+}}(%esp), %ax
				; X86-NEXT: retl
				%1 = and i16 %n, 15
				%2 = shl i16 1, %1
				%3 = xor i16 %2, -1
				%4 = and i16 %x, %3
				ret i16 %4
				}

				define i16 @bts_16_mask(i16 %x, i16 %n) {
				; X64-LABEL: bts_16_mask:
				; X64: # %bb.0:
				; X64-NEXT: andb $15, %sil
				; X64-NEXT: btsl %esi, %edi
				; X64-NEXT: movl %edi, %eax
				; X64-NEXT: retq
				;
				; X86-LABEL: bts_16_mask:
				; X86: # %bb.0:
				; X86-NEXT: movb {{[0-9]+}}(%esp), %cl
				; X86-NEXT: andb $15, %cl
				; X86-NEXT: movl $1, %eax
				; X86-NEXT: shll %cl, %eax
				; X86-NEXT: orw {{[0-9]+}}(%esp), %ax
				; X86-NEXT: # kill: def $ax killed $ax killed $eax
				; X86-NEXT: retl
				%1 = and i16 %n, 15
				%2 = shl i16 1, %1
				%3 = or i16 %x, %2
				ret i16 %3
				}

				define i16 @btc_16_mask(i16 %x, i16 %n) {
				; X64-LABEL: btc_16_mask:
				; X64: # %bb.0:
				; X64-NEXT: andb $15, %sil
				; X64-NEXT: btcl %esi, %edi
				; X64-NEXT: movl %edi, %eax
				; X64-NEXT: retq
				;
				; X86-LABEL: btc_16_mask:
				; X86: # %bb.0:
				; X86-NEXT: movb {{[0-9]+}}(%esp), %cl
				; X86-NEXT: andb $15, %cl
				; X86-NEXT: movl $1, %eax
				; X86-NEXT: shll %cl, %eax
				; X86-NEXT: xorw {{[0-9]+}}(%esp), %ax
				; X86-NEXT: # kill: def $ax killed $ax killed $eax
				; X86-NEXT: retl
				%1 = and i16 %n, 15
				%2 = shl i16 1, %1
				%3 = xor i16 %x, %2
				ret i16 %3
				}

				define i32 @btr_32_mask(i32 %x, i32 %n) {
				; X64-LABEL: btr_32_mask:
				; X64: # %bb.0:
				; X64-NEXT: btrl %esi, %edi
				; X64-NEXT: movl %edi, %eax
				; X64-NEXT: retq
				;
				; X86-LABEL: btr_32_mask:
				; X86: # %bb.0:
				; X86-NEXT: movb {{[0-9]+}}(%esp), %cl
				; X86-NEXT: movl $-2, %eax
				; X86-NEXT: roll %cl, %eax
				; X86-NEXT: andl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: retl
				%1 = and i32 %n, 31
				%2 = shl i32 1, %1
				%3 = xor i32 %2, -1
				%4 = and i32 %x, %3
				ret i32 %4
				}

				define i32 @bts_32_mask(i32 %x, i32 %n) {
				; X64-LABEL: bts_32_mask:
				; X64: # %bb.0:
				; X64-NEXT: btsl %esi, %edi
				; X64-NEXT: movl %edi, %eax
				; X64-NEXT: retq
				;
				; X86-LABEL: bts_32_mask:
				; X86: # %bb.0:
				; X86-NEXT: movb {{[0-9]+}}(%esp), %cl
				; X86-NEXT: movl $1, %eax
				; X86-NEXT: shll %cl, %eax
				; X86-NEXT: orl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: retl
				%1 = and i32 %n, 31
				%2 = shl i32 1, %1
				%3 = or i32 %x, %2
				ret i32 %3
				}

				define i32 @btc_32_mask(i32 %x, i32 %n) {
				; X64-LABEL: btc_32_mask:
				; X64: # %bb.0:
				; X64-NEXT: btcl %esi, %edi
				; X64-NEXT: movl %edi, %eax
				; X64-NEXT: retq
				;
				; X86-LABEL: btc_32_mask:
				; X86: # %bb.0:
				; X86-NEXT: movb {{[0-9]+}}(%esp), %cl
				; X86-NEXT: movl $1, %eax
				; X86-NEXT: shll %cl, %eax
				; X86-NEXT: xorl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: retl
				%1 = and i32 %n, 31
				%2 = shl i32 1, %1
				%3 = xor i32 %x, %2
				ret i32 %3
				}

				define i64 @btr_64_mask(i64 %x, i64 %n) {
				; X64-LABEL: btr_64_mask:
				; X64: # %bb.0:
				; X64-NEXT: btrq %rsi, %rdi
				; X64-NEXT: movq %rdi, %rax
				; X64-NEXT: retq
				;
				; X86-LABEL: btr_64_mask:
				; X86: # %bb.0:
				; X86-NEXT: movb {{[0-9]+}}(%esp), %cl
				; X86-NEXT: movl $1, %eax
				; X86-NEXT: xorl %edx, %edx
				; X86-NEXT: shldl %cl, %eax, %edx
				; X86-NEXT: shll %cl, %eax
				; X86-NEXT: testb $32, %cl
				; X86-NEXT: je .LBB15_2
				; X86-NEXT: # %bb.1:
				; X86-NEXT: movl %eax, %edx
				; X86-NEXT: xorl %eax, %eax
				; X86-NEXT: .LBB15_2:
				; X86-NEXT: notl %edx
				; X86-NEXT: notl %eax
				; X86-NEXT: andl {{[0-9]+}}(%esp), %edx
				; X86-NEXT: andl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: retl
				%1 = and i64 %n, 63
				%2 = shl i64 1, %1
				%3 = xor i64 %2, -1
				%4 = and i64 %x, %3
				ret i64 %4
				}

				define i64 @bts_64_mask(i64 %x, i64 %n) {
				; X64-LABEL: bts_64_mask:
				; X64: # %bb.0:
				; X64-NEXT: btsq %rsi, %rdi
				; X64-NEXT: movq %rdi, %rax
				; X64-NEXT: retq
				;
				; X86-LABEL: bts_64_mask:
				; X86: # %bb.0:
				; X86-NEXT: movb {{[0-9]+}}(%esp), %cl
				; X86-NEXT: movl $1, %eax
				; X86-NEXT: xorl %edx, %edx
				; X86-NEXT: shldl %cl, %eax, %edx
				; X86-NEXT: shll %cl, %eax
				; X86-NEXT: testb $32, %cl
				; X86-NEXT: je .LBB16_2
				; X86-NEXT: # %bb.1:
				; X86-NEXT: movl %eax, %edx
				; X86-NEXT: xorl %eax, %eax
				; X86-NEXT: .LBB16_2:
				; X86-NEXT: orl {{[0-9]+}}(%esp), %edx
				; X86-NEXT: orl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: retl
				%1 = and i64 %n, 63
				%2 = shl i64 1, %1
				%3 = or i64 %x, %2
				ret i64 %3
				}

				define i64 @btc_64_mask(i64 %x, i64 %n) {
				; X64-LABEL: btc_64_mask:
				; X64: # %bb.0:
				; X64-NEXT: btcq %rsi, %rdi
				; X64-NEXT: movq %rdi, %rax
				; X64-NEXT: retq
				;
				; X86-LABEL: btc_64_mask:
				; X86: # %bb.0:
				; X86-NEXT: movb {{[0-9]+}}(%esp), %cl
				; X86-NEXT: movl $1, %eax
				; X86-NEXT: xorl %edx, %edx
				; X86-NEXT: shldl %cl, %eax, %edx
				; X86-NEXT: shll %cl, %eax
				; X86-NEXT: testb $32, %cl
				; X86-NEXT: je .LBB17_2
				; X86-NEXT: # %bb.1:
				; X86-NEXT: movl %eax, %edx
				; X86-NEXT: xorl %eax, %eax
				; X86-NEXT: .LBB17_2:
				; X86-NEXT: xorl {{[0-9]+}}(%esp), %edx
				; X86-NEXT: xorl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: retl
				%1 = and i64 %n, 63
				%2 = shl i64 1, %1
				%3 = xor i64 %x, %2
				ret i64 %3
				}

				; Tests below use loads and we favor folding those over matching btc/btr/bts.

				define i16 @btr_16_load(i16* %x, i16 %n) {
				; X64-LABEL: btr_16_load:
				; X64: # %bb.0:
				; X64-NEXT: movw $-2, %ax
				; X64-NEXT: movl %esi, %ecx
				; X64-NEXT: rolw %cl, %ax
				; X64-NEXT: andw (%rdi), %ax
				; X64-NEXT: retq
				;
				; X86-LABEL: btr_16_load:
				; X86: # %bb.0:
				; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
				; X86-NEXT: movb {{[0-9]+}}(%esp), %cl
				; X86-NEXT: movw $-2, %ax
				; X86-NEXT: rolw %cl, %ax
				; X86-NEXT: andw (%edx), %ax
				; X86-NEXT: retl
				%1 = load i16, i16* %x
				%2 = shl i16 1, %n
				%3 = xor i16 %2, -1
				%4 = and i16 %1, %3
				ret i16 %4
				}

				define i16 @bts_16_load(i16* %x, i16 %n) {
				; X64-LABEL: bts_16_load:
				; X64: # %bb.0:
				; X64-NEXT: movl $1, %eax
				; X64-NEXT: movl %esi, %ecx
				; X64-NEXT: shll %cl, %eax
				; X64-NEXT: orw (%rdi), %ax
				; X64-NEXT: # kill: def $ax killed $ax killed $eax
				; X64-NEXT: retq
				;
				; X86-LABEL: bts_16_load:
				; X86: # %bb.0:
				; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
				; X86-NEXT: movb {{[0-9]+}}(%esp), %cl
				; X86-NEXT: movl $1, %eax
				; X86-NEXT: shll %cl, %eax
				; X86-NEXT: orw (%edx), %ax
				; X86-NEXT: # kill: def $ax killed $ax killed $eax
				; X86-NEXT: retl
				%1 = load i16, i16* %x
				%2 = shl i16 1, %n
				%3 = or i16 %1, %2
				ret i16 %3
				}

				define i16 @btc_16_load(i16* %x, i16 %n) {
				; X64-LABEL: btc_16_load:
				; X64: # %bb.0:
				; X64-NEXT: movl $1, %eax
				; X64-NEXT: movl %esi, %ecx
				; X64-NEXT: shll %cl, %eax
				; X64-NEXT: xorw (%rdi), %ax
				; X64-NEXT: # kill: def $ax killed $ax killed $eax
				; X64-NEXT: retq
				;
				; X86-LABEL: btc_16_load:
				; X86: # %bb.0:
				; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
				; X86-NEXT: movb {{[0-9]+}}(%esp), %cl
				; X86-NEXT: movl $1, %eax
				; X86-NEXT: shll %cl, %eax
				; X86-NEXT: xorw (%edx), %ax
				; X86-NEXT: # kill: def $ax killed $ax killed $eax
				; X86-NEXT: retl
				%1 = load i16, i16* %x
				%2 = shl i16 1, %n
				%3 = xor i16 %1, %2
				ret i16 %3
				}

				define i32 @btr_32_load(i32* %x, i32 %n) {
				; X64-LABEL: btr_32_load:
				; X64: # %bb.0:
				; X64-NEXT: movl $-2, %eax
				; X64-NEXT: movl %esi, %ecx
				; X64-NEXT: roll %cl, %eax
				; X64-NEXT: andl (%rdi), %eax
				; X64-NEXT: retq
				lebedev.riUnsubmitted Not Done Reply Inline Actions I wonder if something like mov (%rdi), %edi btr %esi, %edi would be better still than not folding at all? lebedev.ri: I wonder if something like ``` mov (%rdi), %edi btr %esi, %edi ``` would be better still than…
				craig.topperAuthorUnsubmitted Not Done Reply Inline Actions Probably, but its not exactly easy to do. Tablegen generates the match order by ranking how many SDNodes are covered by the pattern. The regular memory pattern for and/or/xor covers more nodes so gets higher priority. To override the priority you have to add an AddedComplexity line to the pattern. But I worry that significantly bumping the priority of this pattern to override the load pattern may have other effects and require other priorities to be adjusted. I might me being overly cautious, but I'd like to keep the simple approach. gcc doesn't do this when there is a memory op either. craig.topper: Probably, but its not exactly easy to do. Tablegen generates the match order by ranking how…
				;
				; X86-LABEL: btr_32_load:
				; X86: # %bb.0:
				; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
				; X86-NEXT: movb {{[0-9]+}}(%esp), %cl
				; X86-NEXT: movl $-2, %eax
				; X86-NEXT: roll %cl, %eax
				; X86-NEXT: andl (%edx), %eax
				; X86-NEXT: retl
				%1 = load i32, i32* %x
				%2 = shl i32 1, %n
				%3 = xor i32 %2, -1
				%4 = and i32 %1, %3
				ret i32 %4
				}

				define i32 @bts_32_load(i32* %x, i32 %n) {
				; X64-LABEL: bts_32_load:
				; X64: # %bb.0:
				; X64-NEXT: movl $1, %eax
				; X64-NEXT: movl %esi, %ecx
				; X64-NEXT: shll %cl, %eax
				; X64-NEXT: orl (%rdi), %eax
				; X64-NEXT: retq
				;
				; X86-LABEL: bts_32_load:
				; X86: # %bb.0:
				; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
				; X86-NEXT: movb {{[0-9]+}}(%esp), %cl
				; X86-NEXT: movl $1, %eax
				; X86-NEXT: shll %cl, %eax
				; X86-NEXT: orl (%edx), %eax
				; X86-NEXT: retl
				%1 = load i32, i32* %x
				%2 = shl i32 1, %n
				%3 = or i32 %1, %2
				ret i32 %3
				}

				define i32 @btc_32_load(i32* %x, i32 %n) {
				; X64-LABEL: btc_32_load:
				; X64: # %bb.0:
				; X64-NEXT: movl $1, %eax
				; X64-NEXT: movl %esi, %ecx
				; X64-NEXT: shll %cl, %eax
				; X64-NEXT: xorl (%rdi), %eax
				; X64-NEXT: retq
				;
				; X86-LABEL: btc_32_load:
				; X86: # %bb.0:
				; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
				; X86-NEXT: movb {{[0-9]+}}(%esp), %cl
				; X86-NEXT: movl $1, %eax
				; X86-NEXT: shll %cl, %eax
				; X86-NEXT: xorl (%edx), %eax
				; X86-NEXT: retl
				%1 = load i32, i32* %x
				%2 = shl i32 1, %n
				%3 = xor i32 %1, %2
				ret i32 %3
				}

				define i64 @btr_64_load(i64* %x, i64 %n) {
				; X64-LABEL: btr_64_load:
				; X64: # %bb.0:
				; X64-NEXT: movq $-2, %rax
				; X64-NEXT: movl %esi, %ecx
				; X64-NEXT: rolq %cl, %rax
				; X64-NEXT: andq (%rdi), %rax
				; X64-NEXT: retq
				;
				; X86-LABEL: btr_64_load:
				; X86: # %bb.0:
				; X86-NEXT: pushl %esi
				; X86-NEXT: .cfi_def_cfa_offset 8
				; X86-NEXT: .cfi_offset %esi, -8
				; X86-NEXT: movl {{[0-9]+}}(%esp), %esi
				; X86-NEXT: movb {{[0-9]+}}(%esp), %cl
				; X86-NEXT: movl $1, %eax
				; X86-NEXT: xorl %edx, %edx
				; X86-NEXT: shldl %cl, %eax, %edx
				; X86-NEXT: shll %cl, %eax
				; X86-NEXT: testb $32, %cl
				; X86-NEXT: je .LBB24_2
				; X86-NEXT: # %bb.1:
				; X86-NEXT: movl %eax, %edx
				; X86-NEXT: xorl %eax, %eax
				; X86-NEXT: .LBB24_2:
				; X86-NEXT: notl %edx
				; X86-NEXT: notl %eax
				; X86-NEXT: andl 4(%esi), %edx
				; X86-NEXT: andl (%esi), %eax
				; X86-NEXT: popl %esi
				; X86-NEXT: .cfi_def_cfa_offset 4
				; X86-NEXT: retl
				%1 = load i64, i64* %x
				%2 = shl i64 1, %n
				%3 = xor i64 %2, -1
				%4 = and i64 %1, %3
				ret i64 %4
				}

				define i64 @bts_64_load(i64* %x, i64 %n) {
				; X64-LABEL: bts_64_load:
				; X64: # %bb.0:
				; X64-NEXT: movl $1, %eax
				; X64-NEXT: movl %esi, %ecx
				; X64-NEXT: shlq %cl, %rax
				; X64-NEXT: orq (%rdi), %rax
				; X64-NEXT: retq
				;
				; X86-LABEL: bts_64_load:
				; X86: # %bb.0:
				; X86-NEXT: pushl %esi
				; X86-NEXT: .cfi_def_cfa_offset 8
				; X86-NEXT: .cfi_offset %esi, -8
				; X86-NEXT: movl {{[0-9]+}}(%esp), %esi
				; X86-NEXT: movb {{[0-9]+}}(%esp), %cl
				; X86-NEXT: movl $1, %eax
				; X86-NEXT: xorl %edx, %edx
				; X86-NEXT: shldl %cl, %eax, %edx
				; X86-NEXT: shll %cl, %eax
				; X86-NEXT: testb $32, %cl
				; X86-NEXT: je .LBB25_2
				; X86-NEXT: # %bb.1:
				; X86-NEXT: movl %eax, %edx
				; X86-NEXT: xorl %eax, %eax
				; X86-NEXT: .LBB25_2:
				; X86-NEXT: orl 4(%esi), %edx
				; X86-NEXT: orl (%esi), %eax
				; X86-NEXT: popl %esi
				; X86-NEXT: .cfi_def_cfa_offset 4
				; X86-NEXT: retl
				%1 = load i64, i64* %x
				%2 = shl i64 1, %n
				%3 = or i64 %1, %2
				ret i64 %3
				}

				define i64 @btc_64_load(i64* %x, i64 %n) {
				; X64-LABEL: btc_64_load:
				; X64: # %bb.0:
				; X64-NEXT: movl $1, %eax
				; X64-NEXT: movl %esi, %ecx
				; X64-NEXT: shlq %cl, %rax
				; X64-NEXT: xorq (%rdi), %rax
				; X64-NEXT: retq
				;
				; X86-LABEL: btc_64_load:
				; X86: # %bb.0:
				; X86-NEXT: pushl %esi
				; X86-NEXT: .cfi_def_cfa_offset 8
				; X86-NEXT: .cfi_offset %esi, -8
				; X86-NEXT: movl {{[0-9]+}}(%esp), %esi
				; X86-NEXT: movb {{[0-9]+}}(%esp), %cl
				; X86-NEXT: movl $1, %eax
				; X86-NEXT: xorl %edx, %edx
				; X86-NEXT: shldl %cl, %eax, %edx
				; X86-NEXT: shll %cl, %eax
				; X86-NEXT: testb $32, %cl
				; X86-NEXT: je .LBB26_2
				; X86-NEXT: # %bb.1:
				; X86-NEXT: movl %eax, %edx
				; X86-NEXT: xorl %eax, %eax
				; X86-NEXT: .LBB26_2:
				; X86-NEXT: xorl 4(%esi), %edx
				; X86-NEXT: xorl (%esi), %eax
				; X86-NEXT: popl %esi
				; X86-NEXT: .cfi_def_cfa_offset 4
				; X86-NEXT: retl
				%1 = load i64, i64* %x
				%2 = shl i64 1, %n
				%3 = xor i64 %1, %2
				ret i64 %3
				}

				; For the tests below, we definitely shouldn't fold them to the memory forms
				; of BTR/BTS/BTC as they have very different semantics from their register
				; counterparts.

				define void @btr_16_dont_fold(i16* %x, i16 %n) {
				; X64-LABEL: btr_16_dont_fold:
				; X64: # %bb.0:
				; X64-NEXT: movw $-2, %ax
				; X64-NEXT: movl %esi, %ecx
				; X64-NEXT: rolw %cl, %ax
				; X64-NEXT: andw %ax, (%rdi)
				; X64-NEXT: retq
				;
				; X86-LABEL: btr_16_dont_fold:
				; X86: # %bb.0:
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: movb {{[0-9]+}}(%esp), %cl
				; X86-NEXT: movw $-2, %dx
				; X86-NEXT: rolw %cl, %dx
				; X86-NEXT: andw %dx, (%eax)
				; X86-NEXT: retl
				%1 = load i16, i16* %x
				%2 = shl i16 1, %n
				%3 = xor i16 %2, -1
				%4 = and i16 %1, %3
				store i16 %4, i16* %x
				ret void
				}

				define void @bts_16_dont_fold(i16* %x, i16 %n) {
				; X64-LABEL: bts_16_dont_fold:
				; X64: # %bb.0:
				; X64-NEXT: movl $1, %eax
				; X64-NEXT: movl %esi, %ecx
				; X64-NEXT: shll %cl, %eax
				; X64-NEXT: orw %ax, (%rdi)
				; X64-NEXT: retq
				;
				; X86-LABEL: bts_16_dont_fold:
				; X86: # %bb.0:
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: movb {{[0-9]+}}(%esp), %cl
				; X86-NEXT: movl $1, %edx
				; X86-NEXT: shll %cl, %edx
				; X86-NEXT: orw %dx, (%eax)
				; X86-NEXT: retl
				%1 = load i16, i16* %x
				%2 = shl i16 1, %n
				%3 = or i16 %1, %2
				store i16 %3, i16* %x
				ret void
				}

				define void @btc_16_dont_fold(i16* %x, i16 %n) {
				; X64-LABEL: btc_16_dont_fold:
				; X64: # %bb.0:
				; X64-NEXT: movl $1, %eax
				; X64-NEXT: movl %esi, %ecx
				; X64-NEXT: shll %cl, %eax
				; X64-NEXT: xorw %ax, (%rdi)
				; X64-NEXT: retq
				;
				; X86-LABEL: btc_16_dont_fold:
				; X86: # %bb.0:
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: movb {{[0-9]+}}(%esp), %cl
				; X86-NEXT: movl $1, %edx
				; X86-NEXT: shll %cl, %edx
				; X86-NEXT: xorw %dx, (%eax)
				; X86-NEXT: retl
				%1 = load i16, i16* %x
				%2 = shl i16 1, %n
				%3 = xor i16 %1, %2
				store i16 %3, i16* %x
				ret void
				}

				define void @btr_32_dont_fold(i32* %x, i32 %n) {
				; X64-LABEL: btr_32_dont_fold:
				; X64: # %bb.0:
				; X64-NEXT: movl $-2, %eax
				; X64-NEXT: movl %esi, %ecx
				; X64-NEXT: roll %cl, %eax
				; X64-NEXT: andl %eax, (%rdi)
				; X64-NEXT: retq
				;
				; X86-LABEL: btr_32_dont_fold:
				; X86: # %bb.0:
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: movb {{[0-9]+}}(%esp), %cl
				; X86-NEXT: movl $-2, %edx
				; X86-NEXT: roll %cl, %edx
				; X86-NEXT: andl %edx, (%eax)
				; X86-NEXT: retl
				%1 = load i32, i32* %x
				%2 = shl i32 1, %n
				%3 = xor i32 %2, -1
				%4 = and i32 %1, %3
				store i32 %4, i32* %x
				ret void
				}

				define void @bts_32_dont_fold(i32* %x, i32 %n) {
				; X64-LABEL: bts_32_dont_fold:
				; X64: # %bb.0:
				; X64-NEXT: movl $1, %eax
				; X64-NEXT: movl %esi, %ecx
				; X64-NEXT: shll %cl, %eax
				; X64-NEXT: orl %eax, (%rdi)
				; X64-NEXT: retq
				;
				; X86-LABEL: bts_32_dont_fold:
				; X86: # %bb.0:
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: movb {{[0-9]+}}(%esp), %cl
				; X86-NEXT: movl $1, %edx
				; X86-NEXT: shll %cl, %edx
				; X86-NEXT: orl %edx, (%eax)
				; X86-NEXT: retl
				%1 = load i32, i32* %x
				%2 = shl i32 1, %n
				%3 = or i32 %1, %2
				store i32 %3, i32* %x
				ret void
				}

				define void @btc_32_dont_fold(i32* %x, i32 %n) {
				; X64-LABEL: btc_32_dont_fold:
				; X64: # %bb.0:
				; X64-NEXT: movl $1, %eax
				; X64-NEXT: movl %esi, %ecx
				; X64-NEXT: shll %cl, %eax
				; X64-NEXT: xorl %eax, (%rdi)
				; X64-NEXT: retq
				;
				; X86-LABEL: btc_32_dont_fold:
				; X86: # %bb.0:
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: movb {{[0-9]+}}(%esp), %cl
				; X86-NEXT: movl $1, %edx
				; X86-NEXT: shll %cl, %edx
				; X86-NEXT: xorl %edx, (%eax)
				; X86-NEXT: retl
				%1 = load i32, i32* %x
				%2 = shl i32 1, %n
				%3 = xor i32 %1, %2
				store i32 %3, i32* %x
				ret void
				}

				define void @btr_64_dont_fold(i64* %x, i64 %n) {
				; X64-LABEL: btr_64_dont_fold:
				; X64: # %bb.0:
				; X64-NEXT: movq $-2, %rax
				; X64-NEXT: movl %esi, %ecx
				; X64-NEXT: rolq %cl, %rax
				; X64-NEXT: andq %rax, (%rdi)
				; X64-NEXT: retq
				;
				; X86-LABEL: btr_64_dont_fold:
				; X86: # %bb.0:
				; X86-NEXT: pushl %esi
				; X86-NEXT: .cfi_def_cfa_offset 8
				; X86-NEXT: .cfi_offset %esi, -8
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: movb {{[0-9]+}}(%esp), %cl
				; X86-NEXT: movl $1, %edx
				; X86-NEXT: xorl %esi, %esi
				; X86-NEXT: shldl %cl, %edx, %esi
				; X86-NEXT: shll %cl, %edx
				; X86-NEXT: testb $32, %cl
				; X86-NEXT: je .LBB33_2
				; X86-NEXT: # %bb.1:
				; X86-NEXT: movl %edx, %esi
				; X86-NEXT: xorl %edx, %edx
				; X86-NEXT: .LBB33_2:
				; X86-NEXT: notl %esi
				; X86-NEXT: notl %edx
				; X86-NEXT: andl %esi, 4(%eax)
				; X86-NEXT: andl %edx, (%eax)
				; X86-NEXT: popl %esi
				; X86-NEXT: .cfi_def_cfa_offset 4
				; X86-NEXT: retl
				%1 = load i64, i64* %x
				%2 = shl i64 1, %n
				%3 = xor i64 %2, -1
				%4 = and i64 %1, %3
				store i64 %4, i64* %x
				ret void
				}

				define void @bts_64_dont_fold(i64* %x, i64 %n) {
				; X64-LABEL: bts_64_dont_fold:
				; X64: # %bb.0:
				; X64-NEXT: movl $1, %eax
				; X64-NEXT: movl %esi, %ecx
				; X64-NEXT: shlq %cl, %rax
				; X64-NEXT: orq %rax, (%rdi)
				; X64-NEXT: retq
				;
				; X86-LABEL: bts_64_dont_fold:
				; X86: # %bb.0:
				; X86-NEXT: pushl %esi
				; X86-NEXT: .cfi_def_cfa_offset 8
				; X86-NEXT: .cfi_offset %esi, -8
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: movb {{[0-9]+}}(%esp), %cl
				; X86-NEXT: movl $1, %edx
				; X86-NEXT: xorl %esi, %esi
				; X86-NEXT: shldl %cl, %edx, %esi
				; X86-NEXT: shll %cl, %edx
				; X86-NEXT: testb $32, %cl
				; X86-NEXT: je .LBB34_2
				; X86-NEXT: # %bb.1:
				; X86-NEXT: movl %edx, %esi
				; X86-NEXT: xorl %edx, %edx
				; X86-NEXT: .LBB34_2:
				; X86-NEXT: orl %esi, 4(%eax)
				; X86-NEXT: orl %edx, (%eax)
				; X86-NEXT: popl %esi
				; X86-NEXT: .cfi_def_cfa_offset 4
				; X86-NEXT: retl
				%1 = load i64, i64* %x
				%2 = shl i64 1, %n
				%3 = or i64 %1, %2
				store i64 %3, i64* %x
				ret void
				}

				define void @btc_64_dont_fold(i64* %x, i64 %n) {
				; X64-LABEL: btc_64_dont_fold:
				; X64: # %bb.0:
				; X64-NEXT: movl $1, %eax
				; X64-NEXT: movl %esi, %ecx
				; X64-NEXT: shlq %cl, %rax
				; X64-NEXT: xorq %rax, (%rdi)
				; X64-NEXT: retq
				;
				; X86-LABEL: btc_64_dont_fold:
				; X86: # %bb.0:
				; X86-NEXT: pushl %esi
				; X86-NEXT: .cfi_def_cfa_offset 8
				; X86-NEXT: .cfi_offset %esi, -8
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: movb {{[0-9]+}}(%esp), %cl
				; X86-NEXT: movl $1, %edx
				; X86-NEXT: xorl %esi, %esi
				; X86-NEXT: shldl %cl, %edx, %esi
				; X86-NEXT: shll %cl, %edx
				; X86-NEXT: testb $32, %cl
				; X86-NEXT: je .LBB35_2
				; X86-NEXT: # %bb.1:
				; X86-NEXT: movl %edx, %esi
				; X86-NEXT: xorl %edx, %edx
				; X86-NEXT: .LBB35_2:
				; X86-NEXT: xorl %esi, 4(%eax)
				; X86-NEXT: xorl %edx, (%eax)
				; X86-NEXT: popl %esi
				; X86-NEXT: .cfi_def_cfa_offset 4
				; X86-NEXT: retl
				%1 = load i64, i64* %x
				%2 = shl i64 1, %n
				%3 = xor i64 %1, %2
				store i64 %3, i64* %x
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Use bts/btr/btc for single bit set/clear/complement of a variable bit positionClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 152975

lib/Target/X86/X86InstrCompiler.td

test/CodeGen/X86/btc_bts_btr.ll

[X86] Use bts/btr/btc for single bit set/clear/complement of a variable bit position
ClosedPublic