This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/X86/
-
Target/
-
X86/
1/3
X86InstrCompiler.td
-
X86MCInstLower.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
2/2
atomic_mi.ll

Differential D11382

x86 atomic: optimize a.store(reg op a.load(acquire), release)
ClosedPublic

Authored by jfb on Jul 20 2015, 10:46 PM.

Download Raw Diff

Details

Reviewers

kcc
chandlerc
nadav
t.p.northover
reames
dvyukov
pete
morisset

Commits

rG8662083770b3: x86 atomic: optimize a.store(reg op a.load(acquire), release)
rL244128: x86 atomic: optimize a.store(reg op a.load(acquire), release)

Summary

PR24191 finds that the expected memory-register operations aren't generated when relaxed { load ; modify ; store } is used. This is similar to PR17281 which was addressed in D4796, but only for memory-immediate operations (and for memory orderings up to acquire and release). This patch also handles some floating-point operations.

Diff Detail

Event Timeline

jfb updated this revision to Diff 30230.Jul 20 2015, 10:46 PM

jfb retitled this revision from to x86 atomic: optimize a.store(reg op a.load(acquire), release).

jfb updated this object.

jfb added reviewers: reames, kcc, dvyukov, nadav.

jfb added a subscriber: llvm-commits.

jfb added a reviewer: morisset.Jul 20 2015, 11:15 PM

Will this optimization transform:

int foo() {
   int r = atomic_load_n(&x, __ATOMIC_RELAXED);
   atomic_store_n(&x, r+1, __ATOMIC_RELAXED);
   return r;
}

? If yes, how?

Address dvyukov's comment: test that self-add doesn't fold.

In D11382#208803, @dvyukov wrote:
Will this optimization transform:
int foo() {
   int r = atomic_load_n(&x, __ATOMIC_RELAXED);
   atomic_store_n(&x, r+1, __ATOMIC_RELAXED);
   return r;
}
? If yes, how?

Good point, I added test add_32r_self to ensure that this doesn't happen, and that the pattern matching figures out dependencies properly.

In D11382#209066, @jfb wrote:
In D11382#208803, @dvyukov wrote:
Will this optimization transform:
int foo() {
   int r = atomic_load_n(&x, __ATOMIC_RELAXED);
   atomic_store_n(&x, r+1, __ATOMIC_RELAXED);
   return r;
}
? If yes, how?
Good point, I added test add_32r_self to ensure that this doesn't happen, and that the pattern matching figures out dependencies properly.

I am glad that the comment was useful, but I actually asked a different thing :)
My example does not contain self-add. It contains two usages of a load result, and one of the usages can be potentially folded. My concern was that the code can be compiled as:

MOV [addr], r
ADD [addr], 1
MOV r, rax
RET

or to:

ADD [addr], 1
MOV [addr], rax
RET

Both of which would be incorrect transformations -- two loads instead of one.
I guess this transformation should require that the folded store is the only usage of the load result.

Add test suggested by dvyukov making sure that the load isn't duplicated.

In D11382#209576, @dvyukov wrote:
In D11382#209066, @jfb wrote:
In D11382#208803, @dvyukov wrote:
Will this optimization transform:
int foo() {
   int r = atomic_load_n(&x, __ATOMIC_RELAXED);
   atomic_store_n(&x, r+1, __ATOMIC_RELAXED);
   return r;
}
? If yes, how?
Good point, I added test add_32r_self to ensure that this doesn't happen, and that the pattern matching figures out dependencies properly.
I am glad that the comment was useful, but I actually asked a different thing :)
My example does not contain self-add. It contains two usages of a load result, and one of the usages can be potentially folded. My concern was that the code can be compiled as:
MOV [addr], r
ADD [addr], 1
MOV r, rax
RET
or to:
ADD [addr], 1
MOV [addr], rax
RET
Both of which would be incorrect transformations -- two loads instead of one.
I guess this transformation should require that the folded store is the only usage of the load result.

Oh sorry, I totally misunderstood you! I added a test for this. IIUC it can't happen because the entire pattern that's matched is replace with a pseudo instruction, so an escaping intermediate result wouldn't have a def anymore.

Also handle floating-point memory-register addition.

Add one more FIXME.

LGTM but wait for somebody else

atomic_mi.ll add a colon after FileCheck LABEL checks to prevent partial string matches.

[NFC] name pseudo-instructions more consistently.

jfb updated this object.Jul 22 2015, 12:51 PM

LGTM for the integer part.

I am not really convinced that the floating point change belongs in the same patch: it is conceptually different, and does not seem to share any code with the rest of the patch. It is also not obvious to me why it is only tested on X86_64, a comment would be appreciated.

test/CodeGen/X86/atomic_mi.ll
820	Why ?
837	Why ?

Comment on x86-32 testing of atomics and SSE.

In D11382#210671, @morisset wrote:

LGTM for the integer part.

I am not really convinced that the floating point change belongs in the same patch: it is conceptually different, and does not seem to share any code with the rest of the patch. It is also not obvious to me why it is only tested on X86_64, a comment would be appreciated.

I improved the comment for x86-32. LLVM's code generation was silly when I was testing it out. The instructions I use are in SSE, and even with -mattr=+sse (and +sse2) it wasn't particularly good code, even without using any atomics. I figure x86-32 optimizations of floating-point atomics isn't particularly useful compared to x86-64 at this point in time.

*sd instructions are in SSE2.

Actually, *sd instructions are SSE2, so I fixed the pattern matcher :)

To be specific, the x86-32 code with -mattr=+sse generates the following:

fadd_32r:                               # @fadd_32r
	# ...
	movl	12(%esp), %eax
	movl	(%eax), %ecx
	movl	%ecx, (%esp)
	movss	(%esp), %xmm0           # xmm0 = mem[0],zero,zero,zero
	addss	16(%esp), %xmm0
	movss	%xmm0, 4(%esp)
	movl	4(%esp), %ecx
	movl	%ecx, (%eax)
	addl	$8, %esp
	retl
	# ...
fadd_64r:                               # @fadd_64r
	# ...
	movl	32(%esp), %esi
	xorl	%eax, %eax
	xorl	%edx, %edx
	xorl	%ebx, %ebx
	xorl	%ecx, %ecx
	lock		cmpxchg8b	(%esi)
	movl	%edx, 12(%esp)
	movl	%eax, 8(%esp)
	fldl	8(%esp)
	faddl	36(%esp)
	fstpl	(%esp)
	movl	(%esp), %ebx
	movl	4(%esp), %ecx
	movl	(%esi), %eax
	movl	4(%esi), %edx
	# ...

Part of the problem is calling convention on x86-32, and part of the problem is x87 for 64-bit.

With -mattr=+sse,+sse2 the generated code becomes:

fadd_32r:                               # @fadd_32r
	# ...
	movss	8(%esp), %xmm0          # xmm0 = mem[0],zero,zero,zero
	movl	4(%esp), %eax
	addss	(%eax), %xmm0
	movss	%xmm0, (%eax)
	retl
	# ...
fadd_64r:                               # @fadd_64r
	# ...
	movl	32(%esp), %esi
	xorl	%eax, %eax
	xorl	%edx, %edx
	xorl	%ebx, %ebx
	xorl	%ecx, %ecx
	lock		cmpxchg8b	(%esi)
	movl	%edx, 12(%esp)
	movl	%eax, 8(%esp)
	movsd	8(%esp), %xmm0          # xmm0 = mem[0],zero
	addsd	36(%esp), %xmm0
	movsd	%xmm0, (%esp)
	movl	(%esp), %ebx
	movl	4(%esp), %ecx
	movl	(%esi), %eax
	movl	4(%esi), %edx

I could do something with this, but I'm a bit wary of how the calling convention works out (or rather, that the code generation won't change slightly and throw things off).

@echristo suggested I add @chandlerc because this patch says "atomic". Note @dvyukov's LGTM above.

@chandlerc suggested I add @pete and @t.p.northover as reviewers.

I would have preferred to see the renaming in another patch, but at this point I don't think its worth the effort to split it out. Its a mechanical change, not a change in behavior, and tablegen would have made it clear if it didn't like the change.

So, I agree with @dvyukov, LGTM.

Cheers,
Pete

lib/Target/X86/X86InstrCompiler.td
765	This was there prior to your change, but i wonder if we should have a later patch (by you or me or anyone else) to consider removing this cast. We can do so by taking 'SDNode op' instead of a string. For example diff --git a/lib/Target/X86/X86InstrCompiler.td b/lib/Target/X86/X86InstrCompiler.td index 49dc318..9168713 100644 a/lib/Target/X86/X86InstrCompiler.td +++ b/lib/Target/X86/X86InstrCompiler.td @@ -757,26 +757,26 @@ defm LXADD : ATOMIC_LOAD_BINOP<0xc0, 0xc1, "xadd", "atomic_load_add", extremely late to prevent them from being accidentally reordered in the backend (see below the RELEASE_MOV* / ACQUIRE_MOV* pseudo-instructions) */ -multiclass RELEASE_BINOP_MI<string op> { +multiclass RELEASE_BINOP_MI<SDNode op> { def NAME#8mi : I<0, Pseudo, (outs), (ins i8mem:$dst, i8imm:$src), "#RELEASE_BINOP PSEUDO!", [(atomic_store_8 addr:$dst, (!cast<PatFrag>(op) + [(atomic_store_8 addr:$dst, (op (atomic_load_8 addr:$dst), (i8 imm:$src)))]>; // NAME#16 is not generated as 16-bit arithmetic instructions are considered // costly and avoided as far as possible by this backend anyway def NAME#32mi : I<0, Pseudo, (outs), (ins i32mem:$dst, i32imm:$src), "#RELEASE_BINOP PSEUDO!", [(atomic_store_32 addr:$dst, (!cast<PatFrag>(op) + [(atomic_store_32 addr:$dst, (op (atomic_load_32 addr:$dst), (i32 imm:$src)))]>; def NAME#64mi32 : I<0, Pseudo, (outs), (ins i64mem:$dst, i64i32imm:$src), "#RELEASE_BINOP PSEUDO!", [(atomic_store_64 addr:$dst, (!cast<PatFrag>(op) + [(atomic_store_64 addr:$dst, (op (atomic_load_64 addr:$dst), (i64immSExt32:$src)))]>; } -defm RELEASE_ADD : RELEASE_BINOP_MI<"add">; -defm RELEASE_AND : RELEASE_BINOP_MI<"and">; -defm RELEASE_OR : RELEASE_BINOP_MI<"or">; -defm RELEASE_XOR : RELEASE_BINOP_MI<"xor">; +defm RELEASE_ADD : RELEASE_BINOP_MI<add>; +defm RELEASE_AND : RELEASE_BINOP_MI<and>; +defm RELEASE_OR : RELEASE_BINOP_MI<or>; +defm RELEASE_XOR : RELEASE_BINOP_MI<xor>;

This revision is now accepted and ready to land.Aug 5 2015, 10:17 AM

jfb mentioned this in D6629: x86: Emit LAHF/SAHF instead of PUSHF/POPF.Aug 5 2015, 1:52 PM

Closed by commit rL244128: x86 atomic: optimize a.store(reg op a.load(acquire), release) (authored by jfb). · Explain WhyAug 5 2015, 2:05 PM

This revision was automatically updated to reflect the committed changes.

jfb added inline comments.Aug 5 2015, 2:06 PM

lib/Target/X86/X86InstrCompiler.td
765	I can send a follow-up.

jfb mentioned this in D11788: x86: NFC remove needless InstrCompiler cast.Aug 5 2015, 4:16 PM

jfb mentioned this in rL244167: x86: NFC remove needless InstrCompiler cast.Aug 5 2015, 4:16 PM

jfb marked an inline comment as done.Aug 5 2015, 4:16 PM

jfb added inline comments.

lib/Target/X86/X86InstrCompiler.td
765	Done in D11788.

jfb mentioned this in D13749: x86 atomic codegen: don't drop globals.Oct 14 2015, 3:24 PM

Revision Contents

Path

Size

lib/

Target/

X86/

X86InstrCompiler.td

14 lines

X86MCInstLower.cpp

12 lines

test/

CodeGen/

X86/

atomic_mi.ll

404 lines

Diff 30266

lib/Target/X86/X86InstrCompiler.td

	Show First 20 Lines • Show All 746 Lines • ▼ Show 20 Lines
	}			}

	defm LXADD : ATOMIC_LOAD_BINOP<0xc0, 0xc1, "xadd", "atomic_load_add",			defm LXADD : ATOMIC_LOAD_BINOP<0xc0, 0xc1, "xadd", "atomic_load_add",
	IIC_XADD_LOCK_MEM8, IIC_XADD_LOCK_MEM>,			IIC_XADD_LOCK_MEM8, IIC_XADD_LOCK_MEM>,
	TB, LOCK;			TB, LOCK;

	/* The following multiclass tries to make sure that in code like			/* The following multiclass tries to make sure that in code like
	* x.store (immediate op x.load(acquire), release)			* x.store (immediate op x.load(acquire), release)
				* and
				* x.store (register op x.load(acquire), release)
	* an operation directly on memory is generated instead of wasting a register.			* an operation directly on memory is generated instead of wasting a register.
	* It is not automatic as atomic_store/load are only lowered to MOV instructions			* It is not automatic as atomic_store/load are only lowered to MOV instructions
	* extremely late to prevent them from being accidentally reordered in the backend			* extremely late to prevent them from being accidentally reordered in the backend
	* (see below the RELEASE_MOV* / ACQUIRE_MOV* pseudo-instructions)			* (see below the RELEASE_MOV* / ACQUIRE_MOV* pseudo-instructions)
	*/			*/
	multiclass RELEASE_BINOP_MI<string op> {			multiclass RELEASE_BINOP_MI<string op> {
	def NAME#8mi : I<0, Pseudo, (outs), (ins i8mem:$dst, i8imm:$src),			def NAME#8mi : I<0, Pseudo, (outs), (ins i8mem:$dst, i8imm:$src),
	"#RELEASE_BINOP PSEUDO!",			"#RELEASE_BINOP PSEUDO!",
	[(atomic_store_8 addr:$dst, (!cast<PatFrag>(op)			[(atomic_store_8 addr:$dst, (!cast<PatFrag>(op)
				peteUnsubmitted Not Done Reply Inline Actions This was there prior to your change, but i wonder if we should have a later patch (by you or me or anyone else) to consider removing this cast. We can do so by taking 'SDNode op' instead of a string. For example diff --git a/lib/Target/X86/X86InstrCompiler.td b/lib/Target/X86/X86InstrCompiler.td index 49dc318..9168713 100644 a/lib/Target/X86/X86InstrCompiler.td +++ b/lib/Target/X86/X86InstrCompiler.td @@ -757,26 +757,26 @@ defm LXADD : ATOMIC_LOAD_BINOP<0xc0, 0xc1, "xadd", "atomic_load_add", extremely late to prevent them from being accidentally reordered in the backend (see below the RELEASE_MOV* / ACQUIRE_MOV* pseudo-instructions) / -multiclass RELEASE_BINOP_MI<string op> { +multiclass RELEASE_BINOP_MI<SDNode op> { def NAME#8mi : I<0, Pseudo, (outs), (ins i8mem:$dst, i8imm:$src), "#RELEASE_BINOP PSEUDO!", [(atomic_store_8 addr:$dst, (!cast<PatFrag>(op) + [(atomic_store_8 addr:$dst, (op (atomic_load_8 addr:$dst), (i8 imm:$src)))]>; // NAME#16 is not generated as 16-bit arithmetic instructions are considered // costly and avoided as far as possible by this backend anyway def NAME#32mi : I<0, Pseudo, (outs), (ins i32mem:$dst, i32imm:$src), "#RELEASE_BINOP PSEUDO!", [(atomic_store_32 addr:$dst, (!cast<PatFrag>(op) + [(atomic_store_32 addr:$dst, (op (atomic_load_32 addr:$dst), (i32 imm:$src)))]>; def NAME#64mi32 : I<0, Pseudo, (outs), (ins i64mem:$dst, i64i32imm:$src), "#RELEASE_BINOP PSEUDO!", [(atomic_store_64 addr:$dst, (!cast<PatFrag>(op) + [(atomic_store_64 addr:$dst, (op (atomic_load_64 addr:$dst), (i64immSExt32:$src)))]>; } -defm RELEASE_ADD : RELEASE_BINOP_MI<"add">; -defm RELEASE_AND : RELEASE_BINOP_MI<"and">; -defm RELEASE_OR : RELEASE_BINOP_MI<"or">; -defm RELEASE_XOR : RELEASE_BINOP_MI<"xor">; +defm RELEASE_ADD : RELEASE_BINOP_MI<add>; +defm RELEASE_AND : RELEASE_BINOP_MI<and>; +defm RELEASE_OR : RELEASE_BINOP_MI<or>; +defm RELEASE_XOR : RELEASE_BINOP_MI<xor>; pete:* This was there prior to your change, but i wonder if we should have a later patch (by you or me…
				jfbAuthorUnsubmitted Not Done Reply Inline Actions I can send a follow-up. jfb: I can send a follow-up.
				jfbAuthorUnsubmitted Not Done Reply Inline Actions Done in D11788. jfb: Done in D11788.
	(atomic_load_8 addr:$dst), (i8 imm:$src)))]>;			(atomic_load_8 addr:$dst), (i8 imm:$src)))]>;
				def NAME#8mr : I<0, Pseudo, (outs), (ins i8mem:$dst, GR8:$src),
				"#RELEASE_BINOP PSEUDO!",
				[(atomic_store_8 addr:$dst, (!cast<PatFrag>(op)
				(atomic_load_8 addr:$dst), GR8:$src))]>;
	// NAME#16 is not generated as 16-bit arithmetic instructions are considered			// NAME#16 is not generated as 16-bit arithmetic instructions are considered
	// costly and avoided as far as possible by this backend anyway			// costly and avoided as far as possible by this backend anyway
	def NAME#32mi : I<0, Pseudo, (outs), (ins i32mem:$dst, i32imm:$src),			def NAME#32mi : I<0, Pseudo, (outs), (ins i32mem:$dst, i32imm:$src),
	"#RELEASE_BINOP PSEUDO!",			"#RELEASE_BINOP PSEUDO!",
	[(atomic_store_32 addr:$dst, (!cast<PatFrag>(op)			[(atomic_store_32 addr:$dst, (!cast<PatFrag>(op)
	(atomic_load_32 addr:$dst), (i32 imm:$src)))]>;			(atomic_load_32 addr:$dst), (i32 imm:$src)))]>;
				def NAME#32mr : I<0, Pseudo, (outs), (ins i32mem:$dst, GR32:$src),
				"#RELEASE_BINOP PSEUDO!",
				[(atomic_store_32 addr:$dst, (!cast<PatFrag>(op)
				(atomic_load_32 addr:$dst), GR32:$src))]>;
	def NAME#64mi32 : I<0, Pseudo, (outs), (ins i64mem:$dst, i64i32imm:$src),			def NAME#64mi32 : I<0, Pseudo, (outs), (ins i64mem:$dst, i64i32imm:$src),
	"#RELEASE_BINOP PSEUDO!",			"#RELEASE_BINOP PSEUDO!",
	[(atomic_store_64 addr:$dst, (!cast<PatFrag>(op)			[(atomic_store_64 addr:$dst, (!cast<PatFrag>(op)
	(atomic_load_64 addr:$dst), (i64immSExt32:$src)))]>;			(atomic_load_64 addr:$dst), (i64immSExt32:$src)))]>;
				def NAME#64mr : I<0, Pseudo, (outs), (ins i64mem:$dst, GR64:$src),
				"#RELEASE_BINOP PSEUDO!",
				[(atomic_store_64 addr:$dst, (!cast<PatFrag>(op)
				(atomic_load_64 addr:$dst), GR64:$src))]>;
	}			}
	defm RELEASE_ADD : RELEASE_BINOP_MI<"add">;			defm RELEASE_ADD : RELEASE_BINOP_MI<"add">;
	defm RELEASE_AND : RELEASE_BINOP_MI<"and">;			defm RELEASE_AND : RELEASE_BINOP_MI<"and">;
	defm RELEASE_OR : RELEASE_BINOP_MI<"or">;			defm RELEASE_OR : RELEASE_BINOP_MI<"or">;
	defm RELEASE_XOR : RELEASE_BINOP_MI<"xor">;			defm RELEASE_XOR : RELEASE_BINOP_MI<"xor">;
	// Note: we don't deal with sub, because substractions of constants are			// Note: we don't deal with sub, because substractions of constants are
	// optimized into additions before this code can run			// optimized into additions before this code can run

	▲ Show 20 Lines • Show All 1,020 Lines • Show Last 20 Lines

lib/Target/X86/X86MCInstLower.cpp

Show First 20 Lines • Show All 592 Lines • ▼ Show 20 Lines	ReSimplify:
case X86::RELEASE_MOV16mr: OutMI.setOpcode(X86::MOV16mr); goto ReSimplify;		case X86::RELEASE_MOV16mr: OutMI.setOpcode(X86::MOV16mr); goto ReSimplify;
case X86::RELEASE_MOV32mr: OutMI.setOpcode(X86::MOV32mr); goto ReSimplify;		case X86::RELEASE_MOV32mr: OutMI.setOpcode(X86::MOV32mr); goto ReSimplify;
case X86::RELEASE_MOV64mr: OutMI.setOpcode(X86::MOV64mr); goto ReSimplify;		case X86::RELEASE_MOV64mr: OutMI.setOpcode(X86::MOV64mr); goto ReSimplify;
case X86::RELEASE_MOV8mi: OutMI.setOpcode(X86::MOV8mi); goto ReSimplify;		case X86::RELEASE_MOV8mi: OutMI.setOpcode(X86::MOV8mi); goto ReSimplify;
case X86::RELEASE_MOV16mi: OutMI.setOpcode(X86::MOV16mi); goto ReSimplify;		case X86::RELEASE_MOV16mi: OutMI.setOpcode(X86::MOV16mi); goto ReSimplify;
case X86::RELEASE_MOV32mi: OutMI.setOpcode(X86::MOV32mi); goto ReSimplify;		case X86::RELEASE_MOV32mi: OutMI.setOpcode(X86::MOV32mi); goto ReSimplify;
case X86::RELEASE_MOV64mi32: OutMI.setOpcode(X86::MOV64mi32); goto ReSimplify;		case X86::RELEASE_MOV64mi32: OutMI.setOpcode(X86::MOV64mi32); goto ReSimplify;
case X86::RELEASE_ADD8mi: OutMI.setOpcode(X86::ADD8mi); goto ReSimplify;		case X86::RELEASE_ADD8mi: OutMI.setOpcode(X86::ADD8mi); goto ReSimplify;
		case X86::RELEASE_ADD8mr: OutMI.setOpcode(X86::ADD8mr); goto ReSimplify;
case X86::RELEASE_ADD32mi: OutMI.setOpcode(X86::ADD32mi); goto ReSimplify;		case X86::RELEASE_ADD32mi: OutMI.setOpcode(X86::ADD32mi); goto ReSimplify;
		case X86::RELEASE_ADD32mr: OutMI.setOpcode(X86::ADD32mr); goto ReSimplify;
case X86::RELEASE_ADD64mi32: OutMI.setOpcode(X86::ADD64mi32); goto ReSimplify;		case X86::RELEASE_ADD64mi32: OutMI.setOpcode(X86::ADD64mi32); goto ReSimplify;
		case X86::RELEASE_ADD64mr: OutMI.setOpcode(X86::ADD64mr); goto ReSimplify;
case X86::RELEASE_AND8mi: OutMI.setOpcode(X86::AND8mi); goto ReSimplify;		case X86::RELEASE_AND8mi: OutMI.setOpcode(X86::AND8mi); goto ReSimplify;
		case X86::RELEASE_AND8mr: OutMI.setOpcode(X86::AND8mr); goto ReSimplify;
case X86::RELEASE_AND32mi: OutMI.setOpcode(X86::AND32mi); goto ReSimplify;		case X86::RELEASE_AND32mi: OutMI.setOpcode(X86::AND32mi); goto ReSimplify;
		case X86::RELEASE_AND32mr: OutMI.setOpcode(X86::AND32mr); goto ReSimplify;
case X86::RELEASE_AND64mi32: OutMI.setOpcode(X86::AND64mi32); goto ReSimplify;		case X86::RELEASE_AND64mi32: OutMI.setOpcode(X86::AND64mi32); goto ReSimplify;
		case X86::RELEASE_AND64mr: OutMI.setOpcode(X86::AND64mr); goto ReSimplify;
case X86::RELEASE_OR8mi: OutMI.setOpcode(X86::OR8mi); goto ReSimplify;		case X86::RELEASE_OR8mi: OutMI.setOpcode(X86::OR8mi); goto ReSimplify;
		case X86::RELEASE_OR8mr: OutMI.setOpcode(X86::OR8mr); goto ReSimplify;
case X86::RELEASE_OR32mi: OutMI.setOpcode(X86::OR32mi); goto ReSimplify;		case X86::RELEASE_OR32mi: OutMI.setOpcode(X86::OR32mi); goto ReSimplify;
		case X86::RELEASE_OR32mr: OutMI.setOpcode(X86::OR32mr); goto ReSimplify;
case X86::RELEASE_OR64mi32: OutMI.setOpcode(X86::OR64mi32); goto ReSimplify;		case X86::RELEASE_OR64mi32: OutMI.setOpcode(X86::OR64mi32); goto ReSimplify;
		case X86::RELEASE_OR64mr: OutMI.setOpcode(X86::OR64mr); goto ReSimplify;
case X86::RELEASE_XOR8mi: OutMI.setOpcode(X86::XOR8mi); goto ReSimplify;		case X86::RELEASE_XOR8mi: OutMI.setOpcode(X86::XOR8mi); goto ReSimplify;
		case X86::RELEASE_XOR8mr: OutMI.setOpcode(X86::XOR8mr); goto ReSimplify;
case X86::RELEASE_XOR32mi: OutMI.setOpcode(X86::XOR32mi); goto ReSimplify;		case X86::RELEASE_XOR32mi: OutMI.setOpcode(X86::XOR32mi); goto ReSimplify;
		case X86::RELEASE_XOR32mr: OutMI.setOpcode(X86::XOR32mr); goto ReSimplify;
case X86::RELEASE_XOR64mi32: OutMI.setOpcode(X86::XOR64mi32); goto ReSimplify;		case X86::RELEASE_XOR64mi32: OutMI.setOpcode(X86::XOR64mi32); goto ReSimplify;
		case X86::RELEASE_XOR64mr: OutMI.setOpcode(X86::XOR64mr); goto ReSimplify;
case X86::RELEASE_INC8m: OutMI.setOpcode(X86::INC8m); goto ReSimplify;		case X86::RELEASE_INC8m: OutMI.setOpcode(X86::INC8m); goto ReSimplify;
case X86::RELEASE_INC16m: OutMI.setOpcode(X86::INC16m); goto ReSimplify;		case X86::RELEASE_INC16m: OutMI.setOpcode(X86::INC16m); goto ReSimplify;
case X86::RELEASE_INC32m: OutMI.setOpcode(X86::INC32m); goto ReSimplify;		case X86::RELEASE_INC32m: OutMI.setOpcode(X86::INC32m); goto ReSimplify;
case X86::RELEASE_INC64m: OutMI.setOpcode(X86::INC64m); goto ReSimplify;		case X86::RELEASE_INC64m: OutMI.setOpcode(X86::INC64m); goto ReSimplify;
case X86::RELEASE_DEC8m: OutMI.setOpcode(X86::DEC8m); goto ReSimplify;		case X86::RELEASE_DEC8m: OutMI.setOpcode(X86::DEC8m); goto ReSimplify;
case X86::RELEASE_DEC16m: OutMI.setOpcode(X86::DEC16m); goto ReSimplify;		case X86::RELEASE_DEC16m: OutMI.setOpcode(X86::DEC16m); goto ReSimplify;
case X86::RELEASE_DEC32m: OutMI.setOpcode(X86::DEC32m); goto ReSimplify;		case X86::RELEASE_DEC32m: OutMI.setOpcode(X86::DEC32m); goto ReSimplify;
case X86::RELEASE_DEC64m: OutMI.setOpcode(X86::DEC64m); goto ReSimplify;		case X86::RELEASE_DEC64m: OutMI.setOpcode(X86::DEC64m); goto ReSimplify;
▲ Show 20 Lines • Show All 724 Lines • Show Last 20 Lines

test/CodeGen/X86/atomic_mi.ll

	Show First 20 Lines • Show All 88 Lines • ▼ Show 20 Lines
	; X32-LABEL: store_atomic_imm_32_seq_cst			; X32-LABEL: store_atomic_imm_32_seq_cst
	; X32: xchgl			; X32: xchgl
	store atomic i32 42, i32* %p seq_cst, align 4			store atomic i32 42, i32* %p seq_cst, align 4
	ret void			ret void
	}			}

	; ----- ADD -----			; ----- ADD -----

	define void @add_8(i8* %p) {			define void @add_8i(i8* %p) {
	; X64-LABEL: add_8			; X64-LABEL: add_8i
	; X64-NOT: lock			; X64-NOT: lock
	; X64: addb			; X64: addb
	; X64-NOT: movb			; X64-NOT: movb
	; X32-LABEL: add_8			; X32-LABEL: add_8i
	; X32-NOT: lock			; X32-NOT: lock
	; X32: addb			; X32: addb
	; X32-NOT: movb			; X32-NOT: movb
	%1 = load atomic i8, i8* %p seq_cst, align 1			%1 = load atomic i8, i8* %p seq_cst, align 1
	%2 = add i8 %1, 2			%2 = add i8 %1, 2
	store atomic i8 %2, i8* %p release, align 1			store atomic i8 %2, i8* %p release, align 1
	ret void			ret void
	}			}

	define void @add_16(i16* %p) {			define void @add_8r(i8* %p, i8 %v) {
				; X64-LABEL: add_8r
				; X64-NOT: lock
				; X64: addb
				; X64-NOT: movb
				; X32-LABEL: add_8r
				; X32-NOT: lock
				; X32: addb
				; X32-NOT: movb
				%1 = load atomic i8, i8* %p seq_cst, align 1
				%2 = add i8 %1, %v
				store atomic i8 %2, i8* %p release, align 1
				ret void
				}

				define void @add_16i(i16* %p) {
	; Currently the transformation is not done on 16 bit accesses, as the backend			; Currently the transformation is not done on 16 bit accesses, as the backend
	; treat 16 bit arithmetic as expensive on X86/X86_64.			; treat 16 bit arithmetic as expensive on X86/X86_64.
	; X64-LABEL: add_16			; X64-LABEL: add_16i
	; X64-NOT: addw			; X64-NOT: addw
	; X32-LABEL: add_16			; X32-LABEL: add_16i
	; X32-NOT: addw			; X32-NOT: addw
	%1 = load atomic i16, i16* %p acquire, align 2			%1 = load atomic i16, i16* %p acquire, align 2
	%2 = add i16 %1, 2			%2 = add i16 %1, 2
	store atomic i16 %2, i16* %p release, align 2			store atomic i16 %2, i16* %p release, align 2
	ret void			ret void
	}			}

	define void @add_32(i32* %p) {			define void @add_16r(i16* %p, i16 %v) {
	; X64-LABEL: add_32			; Currently the transformation is not done on 16 bit accesses, as the backend
				; treat 16 bit arithmetic as expensive on X86/X86_64.
				; X64-LABEL: add_16r
				; X64-NOT: addw
				; X32-LABEL: add_16r
				; X32-NOT: addw [.*], (
				%1 = load atomic i16, i16* %p acquire, align 2
				%2 = add i16 %1, %v
				store atomic i16 %2, i16* %p release, align 2
				ret void
				}

				define void @add_32i(i32* %p) {
				; X64-LABEL: add_32i
	; X64-NOT: lock			; X64-NOT: lock
	; X64: addl			; X64: addl
	; X64-NOT: movl			; X64-NOT: movl
	; X32-LABEL: add_32			; X32-LABEL: add_32i
	; X32-NOT: lock			; X32-NOT: lock
	; X32: addl			; X32: addl
	; X32-NOT: movl			; X32-NOT: movl
	%1 = load atomic i32, i32* %p acquire, align 4			%1 = load atomic i32, i32* %p acquire, align 4
	%2 = add i32 %1, 2			%2 = add i32 %1, 2
	store atomic i32 %2, i32* %p monotonic, align 4			store atomic i32 %2, i32* %p monotonic, align 4
	ret void			ret void
	}			}

	define void @add_64(i64* %p) {			define void @add_32r(i32* %p, i32 %v) {
	; X64-LABEL: add_64			; X64-LABEL: add_32r
				; X64-NOT: lock
				; X64: addl
				; X64-NOT: movl
				; X32-LABEL: add_32r
				; X32-NOT: lock
				; X32: addl
				; X32-NOT: movl
				%1 = load atomic i32, i32* %p acquire, align 4
				%2 = add i32 %1, %v
				store atomic i32 %2, i32* %p monotonic, align 4
				ret void
				}

				; The following is a corner case where the load is added to itself. The pattern
				; matching should not fold this. We only test with 32-bit add, but the same
				; applies to other sizes and operations.
				define void @add_32r_self(i32* %p) {
				; X64-LABEL: add_32r_self
				; X64-NOT: lock
				; X64: movl (%[[M:[a-z]+]]), %[[R:[a-z]+]]
				; X64: addl %[[R]], %[[R]]
				; X64: movl %[[R]], (%[[M]])
				; X32-LABEL: add_32r_self
				; X32-NOT: lock
				; X32: movl (%[[M:[a-z]+]]), %[[R:[a-z]+]]
				; X32: addl %[[R]], %[[R]]
				; X32: movl %[[R]], (%[[M]])
				%1 = load atomic i32, i32* %p acquire, align 4
				%2 = add i32 %1, %1
				store atomic i32 %2, i32* %p monotonic, align 4
				ret void
				}

				define void @add_64i(i64* %p) {
				; X64-LABEL: add_64i
	; X64-NOT: lock			; X64-NOT: lock
	; X64: addq			; X64: addq
	; X64-NOT: movq			; X64-NOT: movq
	; We do not check X86-32 as it cannot do 'addq'.			; We do not check X86-32 as it cannot do 'addq'.
	; X32-LABEL: add_64			; X32-LABEL: add_64i
	%1 = load atomic i64, i64* %p acquire, align 8			%1 = load atomic i64, i64* %p acquire, align 8
	%2 = add i64 %1, 2			%2 = add i64 %1, 2
	store atomic i64 %2, i64* %p release, align 8			store atomic i64 %2, i64* %p release, align 8
	ret void			ret void
	}			}

	define void @add_32_seq_cst(i32* %p) {			define void @add_64r(i64* %p, i64 %v) {
	; X64-LABEL: add_32_seq_cst			; X64-LABEL: add_64r
				; X64-NOT: lock
				; X64: addq
				; X64-NOT: movq
				; We do not check X86-32 as it cannot do 'addq'.
				; X32-LABEL: add_64r
				%1 = load atomic i64, i64* %p acquire, align 8
				%2 = add i64 %1, %v
				store atomic i64 %2, i64* %p release, align 8
				ret void
				}

				define void @add_32i_seq_cst(i32* %p) {
				; X64-LABEL: add_32i_seq_cst
	; X64: xchgl			; X64: xchgl
	; X32-LABEL: add_32_seq_cst			; X32-LABEL: add_32i_seq_cst
	; X32: xchgl			; X32: xchgl
	%1 = load atomic i32, i32* %p monotonic, align 4			%1 = load atomic i32, i32* %p monotonic, align 4
	%2 = add i32 %1, 2			%2 = add i32 %1, 2
	store atomic i32 %2, i32* %p seq_cst, align 4			store atomic i32 %2, i32* %p seq_cst, align 4
	ret void			ret void
	}			}

				define void @add_32r_seq_cst(i32* %p, i32 %v) {
				; X64-LABEL: add_32r_seq_cst
				; X64: xchgl
				; X32-LABEL: add_32r_seq_cst
				; X32: xchgl
				%1 = load atomic i32, i32* %p monotonic, align 4
				%2 = add i32 %1, %v
				store atomic i32 %2, i32* %p seq_cst, align 4
				ret void
				}

	; ----- AND -----			; ----- AND -----

	define void @and_8(i8* %p) {			define void @and_8i(i8* %p) {
	; X64-LABEL: and_8			; X64-LABEL: and_8i
	; X64-NOT: lock			; X64-NOT: lock
	; X64: andb			; X64: andb
	; X64-NOT: movb			; X64-NOT: movb
	; X32-LABEL: and_8			; X32-LABEL: and_8i
	; X32-NOT: lock			; X32-NOT: lock
	; X32: andb			; X32: andb
	; X32-NOT: movb			; X32-NOT: movb
	%1 = load atomic i8, i8* %p monotonic, align 1			%1 = load atomic i8, i8* %p monotonic, align 1
	%2 = and i8 %1, 2			%2 = and i8 %1, 2
	store atomic i8 %2, i8* %p release, align 1			store atomic i8 %2, i8* %p release, align 1
	ret void			ret void
	}			}

	define void @and_16(i16* %p) {			define void @and_8r(i8* %p, i8 %v) {
				; X64-LABEL: and_8r
				; X64-NOT: lock
				; X64: andb
				; X64-NOT: movb
				; X32-LABEL: and_8r
				; X32-NOT: lock
				; X32: andb
				; X32-NOT: movb
				%1 = load atomic i8, i8* %p monotonic, align 1
				%2 = and i8 %1, %v
				store atomic i8 %2, i8* %p release, align 1
				ret void
				}

				define void @and_16i(i16* %p) {
	; Currently the transformation is not done on 16 bit accesses, as the backend			; Currently the transformation is not done on 16 bit accesses, as the backend
	; treat 16 bit arithmetic as expensive on X86/X86_64.			; treat 16 bit arithmetic as expensive on X86/X86_64.
	; X64-LABEL: and_16			; X64-LABEL: and_16i
	; X64-NOT: andw			; X64-NOT: andw
	; X32-LABEL: and_16			; X32-LABEL: and_16i
	; X32-NOT: andw			; X32-NOT: andw
	%1 = load atomic i16, i16* %p acquire, align 2			%1 = load atomic i16, i16* %p acquire, align 2
	%2 = and i16 %1, 2			%2 = and i16 %1, 2
	store atomic i16 %2, i16* %p release, align 2			store atomic i16 %2, i16* %p release, align 2
	ret void			ret void
	}			}

	define void @and_32(i32* %p) {			define void @and_16r(i16* %p, i16 %v) {
	; X64-LABEL: and_32			; Currently the transformation is not done on 16 bit accesses, as the backend
				; treat 16 bit arithmetic as expensive on X86/X86_64.
				; X64-LABEL: and_16r
				; X64-NOT: andw
				; X32-LABEL: and_16r
				; X32-NOT: andw [.*], (
				%1 = load atomic i16, i16* %p acquire, align 2
				%2 = and i16 %1, %v
				store atomic i16 %2, i16* %p release, align 2
				ret void
				}

				define void @and_32i(i32* %p) {
				; X64-LABEL: and_32i
	; X64-NOT: lock			; X64-NOT: lock
	; X64: andl			; X64: andl
	; X64-NOT: movl			; X64-NOT: movl
	; X32-LABEL: and_32			; X32-LABEL: and_32i
	; X32-NOT: lock			; X32-NOT: lock
	; X32: andl			; X32: andl
	; X32-NOT: movl			; X32-NOT: movl
	%1 = load atomic i32, i32* %p acquire, align 4			%1 = load atomic i32, i32* %p acquire, align 4
	%2 = and i32 %1, 2			%2 = and i32 %1, 2
	store atomic i32 %2, i32* %p release, align 4			store atomic i32 %2, i32* %p release, align 4
	ret void			ret void
	}			}

	define void @and_64(i64* %p) {			define void @and_32r(i32* %p, i32 %v) {
	; X64-LABEL: and_64			; X64-LABEL: and_32r
				; X64-NOT: lock
				; X64: andl
				; X64-NOT: movl
				; X32-LABEL: and_32r
				; X32-NOT: lock
				; X32: andl
				; X32-NOT: movl
				%1 = load atomic i32, i32* %p acquire, align 4
				%2 = and i32 %1, %v
				store atomic i32 %2, i32* %p release, align 4
				ret void
				}

				define void @and_64i(i64* %p) {
				; X64-LABEL: and_64i
	; X64-NOT: lock			; X64-NOT: lock
	; X64: andq			; X64: andq
	; X64-NOT: movq			; X64-NOT: movq
	; We do not check X86-32 as it cannot do 'andq'.			; We do not check X86-32 as it cannot do 'andq'.
	; X32-LABEL: and_64			; X32-LABEL: and_64i
	%1 = load atomic i64, i64* %p acquire, align 8			%1 = load atomic i64, i64* %p acquire, align 8
	%2 = and i64 %1, 2			%2 = and i64 %1, 2
	store atomic i64 %2, i64* %p release, align 8			store atomic i64 %2, i64* %p release, align 8
	ret void			ret void
	}			}

	define void @and_32_seq_cst(i32* %p) {			define void @and_64r(i64* %p, i64 %v) {
	; X64-LABEL: and_32_seq_cst			; X64-LABEL: and_64r
				; X64-NOT: lock
				; X64: andq
				; X64-NOT: movq
				; We do not check X86-32 as it cannot do 'andq'.
				; X32-LABEL: and_64r
				%1 = load atomic i64, i64* %p acquire, align 8
				%2 = and i64 %1, %v
				store atomic i64 %2, i64* %p release, align 8
				ret void
				}

				define void @and_32i_seq_cst(i32* %p) {
				; X64-LABEL: and_32i_seq_cst
	; X64: xchgl			; X64: xchgl
	; X32-LABEL: and_32_seq_cst			; X32-LABEL: and_32i_seq_cst
	; X32: xchgl			; X32: xchgl
	%1 = load atomic i32, i32* %p monotonic, align 4			%1 = load atomic i32, i32* %p monotonic, align 4
	%2 = and i32 %1, 2			%2 = and i32 %1, 2
	store atomic i32 %2, i32* %p seq_cst, align 4			store atomic i32 %2, i32* %p seq_cst, align 4
	ret void			ret void
	}			}

				define void @and_32r_seq_cst(i32* %p, i32 %v) {
				; X64-LABEL: and_32r_seq_cst
				; X64: xchgl
				; X32-LABEL: and_32r_seq_cst
				; X32: xchgl
				%1 = load atomic i32, i32* %p monotonic, align 4
				%2 = and i32 %1, %v
				store atomic i32 %2, i32* %p seq_cst, align 4
				ret void
				}

	; ----- OR -----			; ----- OR -----

	define void @or_8(i8* %p) {			define void @or_8i(i8* %p) {
	; X64-LABEL: or_8			; X64-LABEL: or_8i
	; X64-NOT: lock			; X64-NOT: lock
	; X64: orb			; X64: orb
	; X64-NOT: movb			; X64-NOT: movb
	; X32-LABEL: or_8			; X32-LABEL: or_8i
	; X32-NOT: lock			; X32-NOT: lock
	; X32: orb			; X32: orb
	; X32-NOT: movb			; X32-NOT: movb
	%1 = load atomic i8, i8* %p acquire, align 1			%1 = load atomic i8, i8* %p acquire, align 1
	%2 = or i8 %1, 2			%2 = or i8 %1, 2
	store atomic i8 %2, i8* %p release, align 1			store atomic i8 %2, i8* %p release, align 1
	ret void			ret void
	}			}

	define void @or_16(i16* %p) {			define void @or_8r(i8* %p, i8 %v) {
	; X64-LABEL: or_16			; X64-LABEL: or_8r
				; X64-NOT: lock
				; X64: orb
				; X64-NOT: movb
				; X32-LABEL: or_8r
				; X32-NOT: lock
				; X32: orb
				; X32-NOT: movb
				%1 = load atomic i8, i8* %p acquire, align 1
				%2 = or i8 %1, %v
				store atomic i8 %2, i8* %p release, align 1
				ret void
				}

				define void @or_16i(i16* %p) {
				; X64-LABEL: or_16i
	; X64-NOT: orw			; X64-NOT: orw
	; X32-LABEL: or_16			; X32-LABEL: or_16i
	; X32-NOT: orw			; X32-NOT: orw
	%1 = load atomic i16, i16* %p acquire, align 2			%1 = load atomic i16, i16* %p acquire, align 2
	%2 = or i16 %1, 2			%2 = or i16 %1, 2
	store atomic i16 %2, i16* %p release, align 2			store atomic i16 %2, i16* %p release, align 2
	ret void			ret void
	}			}

	define void @or_32(i32* %p) {			define void @or_16r(i16* %p, i16 %v) {
	; X64-LABEL: or_32			; X64-LABEL: or_16r
				; X64-NOT: orw
				; X32-LABEL: or_16r
				; X32-NOT: orw [.*], (
				%1 = load atomic i16, i16* %p acquire, align 2
				%2 = or i16 %1, %v
				store atomic i16 %2, i16* %p release, align 2
				ret void
				}

				define void @or_32i(i32* %p) {
				; X64-LABEL: or_32i
	; X64-NOT: lock			; X64-NOT: lock
	; X64: orl			; X64: orl
	; X64-NOT: movl			; X64-NOT: movl
	; X32-LABEL: or_32			; X32-LABEL: or_32i
	; X32-NOT: lock			; X32-NOT: lock
	; X32: orl			; X32: orl
	; X32-NOT: movl			; X32-NOT: movl
	%1 = load atomic i32, i32* %p acquire, align 4			%1 = load atomic i32, i32* %p acquire, align 4
	%2 = or i32 %1, 2			%2 = or i32 %1, 2
	store atomic i32 %2, i32* %p release, align 4			store atomic i32 %2, i32* %p release, align 4
	ret void			ret void
	}			}

	define void @or_64(i64* %p) {			define void @or_32r(i32* %p, i32 %v) {
	; X64-LABEL: or_64			; X64-LABEL: or_32r
				; X64-NOT: lock
				; X64: orl
				; X64-NOT: movl
				; X32-LABEL: or_32r
				; X32-NOT: lock
				; X32: orl
				; X32-NOT: movl
				%1 = load atomic i32, i32* %p acquire, align 4
				%2 = or i32 %1, %v
				store atomic i32 %2, i32* %p release, align 4
				ret void
				}

				define void @or_64i(i64* %p) {
				; X64-LABEL: or_64i
	; X64-NOT: lock			; X64-NOT: lock
	; X64: orq			; X64: orq
	; X64-NOT: movq			; X64-NOT: movq
	; We do not check X86-32 as it cannot do 'orq'.			; We do not check X86-32 as it cannot do 'orq'.
	; X32-LABEL: or_64			; X32-LABEL: or_64i
	%1 = load atomic i64, i64* %p acquire, align 8			%1 = load atomic i64, i64* %p acquire, align 8
	%2 = or i64 %1, 2			%2 = or i64 %1, 2
	store atomic i64 %2, i64* %p release, align 8			store atomic i64 %2, i64* %p release, align 8
	ret void			ret void
	}			}

	define void @or_32_seq_cst(i32* %p) {			define void @or_64r(i64* %p, i64 %v) {
	; X64-LABEL: or_32_seq_cst			; X64-LABEL: or_64r
				; X64-NOT: lock
				; X64: orq
				; X64-NOT: movq
				; We do not check X86-32 as it cannot do 'orq'.
				; X32-LABEL: or_64r
				%1 = load atomic i64, i64* %p acquire, align 8
				%2 = or i64 %1, %v
				store atomic i64 %2, i64* %p release, align 8
				ret void
				}

				define void @or_32i_seq_cst(i32* %p) {
				; X64-LABEL: or_32i_seq_cst
	; X64: xchgl			; X64: xchgl
	; X32-LABEL: or_32_seq_cst			; X32-LABEL: or_32i_seq_cst
	; X32: xchgl			; X32: xchgl
	%1 = load atomic i32, i32* %p monotonic, align 4			%1 = load atomic i32, i32* %p monotonic, align 4
	%2 = or i32 %1, 2			%2 = or i32 %1, 2
	store atomic i32 %2, i32* %p seq_cst, align 4			store atomic i32 %2, i32* %p seq_cst, align 4
	ret void			ret void
	}			}

				define void @or_32r_seq_cst(i32* %p, i32 %v) {
				; X64-LABEL: or_32r_seq_cst
				; X64: xchgl
				; X32-LABEL: or_32r_seq_cst
				; X32: xchgl
				%1 = load atomic i32, i32* %p monotonic, align 4
				%2 = or i32 %1, %v
				store atomic i32 %2, i32* %p seq_cst, align 4
				ret void
				}

	; ----- XOR -----			; ----- XOR -----

	define void @xor_8(i8* %p) {			define void @xor_8i(i8* %p) {
	; X64-LABEL: xor_8			; X64-LABEL: xor_8i
	; X64-NOT: lock			; X64-NOT: lock
	; X64: xorb			; X64: xorb
	; X64-NOT: movb			; X64-NOT: movb
	; X32-LABEL: xor_8			; X32-LABEL: xor_8i
	; X32-NOT: lock			; X32-NOT: lock
	; X32: xorb			; X32: xorb
	; X32-NOT: movb			; X32-NOT: movb
	%1 = load atomic i8, i8* %p acquire, align 1			%1 = load atomic i8, i8* %p acquire, align 1
	%2 = xor i8 %1, 2			%2 = xor i8 %1, 2
	store atomic i8 %2, i8* %p release, align 1			store atomic i8 %2, i8* %p release, align 1
	ret void			ret void
	}			}

	define void @xor_16(i16* %p) {			define void @xor_8r(i8* %p, i8 %v) {
	; X64-LABEL: xor_16			; X64-LABEL: xor_8r
				; X64-NOT: lock
				; X64: xorb
				; X64-NOT: movb
				; X32-LABEL: xor_8r
				; X32-NOT: lock
				; X32: xorb
				; X32-NOT: movb
				%1 = load atomic i8, i8* %p acquire, align 1
				%2 = xor i8 %1, %v
				store atomic i8 %2, i8* %p release, align 1
				ret void
				}

				define void @xor_16i(i16* %p) {
				; X64-LABEL: xor_16i
	; X64-NOT: xorw			; X64-NOT: xorw
	; X32-LABEL: xor_16			; X32-LABEL: xor_16i
	; X32-NOT: xorw			; X32-NOT: xorw
	%1 = load atomic i16, i16* %p acquire, align 2			%1 = load atomic i16, i16* %p acquire, align 2
	%2 = xor i16 %1, 2			%2 = xor i16 %1, 2
	store atomic i16 %2, i16* %p release, align 2			store atomic i16 %2, i16* %p release, align 2
	ret void			ret void
	}			}

	define void @xor_32(i32* %p) {			define void @xor_16r(i16* %p, i16 %v) {
	; X64-LABEL: xor_32			; X64-LABEL: xor_16r
				; X64-NOT: xorw
				; X32-LABEL: xor_16r
				; X32-NOT: xorw [.*], (
				%1 = load atomic i16, i16* %p acquire, align 2
				%2 = xor i16 %1, %v
				store atomic i16 %2, i16* %p release, align 2
				ret void
				}

				define void @xor_32i(i32* %p) {
				; X64-LABEL: xor_32i
	; X64-NOT: lock			; X64-NOT: lock
	; X64: xorl			; X64: xorl
	; X64-NOT: movl			; X64-NOT: movl
	; X32-LABEL: xor_32			; X32-LABEL: xor_32i
	; X32-NOT: lock			; X32-NOT: lock
	; X32: xorl			; X32: xorl
	; X32-NOT: movl			; X32-NOT: movl
	%1 = load atomic i32, i32* %p acquire, align 4			%1 = load atomic i32, i32* %p acquire, align 4
	%2 = xor i32 %1, 2			%2 = xor i32 %1, 2
	store atomic i32 %2, i32* %p release, align 4			store atomic i32 %2, i32* %p release, align 4
	ret void			ret void
	}			}

	define void @xor_64(i64* %p) {			define void @xor_32r(i32* %p, i32 %v) {
	; X64-LABEL: xor_64			; X64-LABEL: xor_32r
				; X64-NOT: lock
				; X64: xorl
				; X64-NOT: movl
				; X32-LABEL: xor_32r
				; X32-NOT: lock
				; X32: xorl
				; X32-NOT: movl
				%1 = load atomic i32, i32* %p acquire, align 4
				%2 = xor i32 %1, %v
				store atomic i32 %2, i32* %p release, align 4
				ret void
				}

				define void @xor_64i(i64* %p) {
				; X64-LABEL: xor_64i
	; X64-NOT: lock			; X64-NOT: lock
	; X64: xorq			; X64: xorq
	; X64-NOT: movq			; X64-NOT: movq
	; We do not check X86-32 as it cannot do 'xorq'.			; We do not check X86-32 as it cannot do 'xorq'.
	; X32-LABEL: xor_64			; X32-LABEL: xor_64i
	%1 = load atomic i64, i64* %p acquire, align 8			%1 = load atomic i64, i64* %p acquire, align 8
	%2 = xor i64 %1, 2			%2 = xor i64 %1, 2
	store atomic i64 %2, i64* %p release, align 8			store atomic i64 %2, i64* %p release, align 8
	ret void			ret void
	}			}

	define void @xor_32_seq_cst(i32* %p) {			define void @xor_64r(i64* %p, i64 %v) {
	; X64-LABEL: xor_32_seq_cst			; X64-LABEL: xor_64r
				; X64-NOT: lock
				; X64: xorq
				; X64-NOT: movq
				; We do not check X86-32 as it cannot do 'xorq'.
				; X32-LABEL: xor_64r
				%1 = load atomic i64, i64* %p acquire, align 8
				%2 = xor i64 %1, %v
				store atomic i64 %2, i64* %p release, align 8
				ret void
				}

				define void @xor_32i_seq_cst(i32* %p) {
				; X64-LABEL: xor_32i_seq_cst
	; X64: xchgl			; X64: xchgl
	; X32-LABEL: xor_32_seq_cst			; X32-LABEL: xor_32i_seq_cst
	; X32: xchgl			; X32: xchgl
	%1 = load atomic i32, i32* %p monotonic, align 4			%1 = load atomic i32, i32* %p monotonic, align 4
	%2 = xor i32 %1, 2			%2 = xor i32 %1, 2
	store atomic i32 %2, i32* %p seq_cst, align 4			store atomic i32 %2, i32* %p seq_cst, align 4
	ret void			ret void
	}			}

				define void @xor_32r_seq_cst(i32* %p, i32 %v) {
				; X64-LABEL: xor_32r_seq_cst
				; X64: xchgl
				; X32-LABEL: xor_32r_seq_cst
				; X32: xchgl
				%1 = load atomic i32, i32* %p monotonic, align 4
				%2 = xor i32 %1, %v
				store atomic i32 %2, i32* %p seq_cst, align 4
				ret void
				}

	; ----- INC -----			; ----- INC -----

	define void @inc_8(i8* %p) {			define void @inc_8(i8* %p) {
	; X64-LABEL: inc_8			; X64-LABEL: inc_8
	; X64-NOT: lock			; X64-NOT: lock
	; X64: incb			; X64: incb
	; X64-NOT: movb			; X64-NOT: movb
	; X32-LABEL: inc_8			; X32-LABEL: inc_8
	▲ Show 20 Lines • Show All 142 Lines • ▼ Show 20 Lines
	; X64-LABEL: dec_32_seq_cst			; X64-LABEL: dec_32_seq_cst
	; X64: xchgl			; X64: xchgl
	; X32-LABEL: dec_32_seq_cst			; X32-LABEL: dec_32_seq_cst
	; X32: xchgl			; X32: xchgl
	%1 = load atomic i32, i32* %p monotonic, align 4			%1 = load atomic i32, i32* %p monotonic, align 4
	%2 = sub i32 %1, 1			%2 = sub i32 %1, 1
	store atomic i32 %2, i32* %p seq_cst, align 4			store atomic i32 %2, i32* %p seq_cst, align 4
	ret void			ret void
	}			}
				morissetUnsubmitted Done Reply Inline Actions Why ? morisset: Why ?
				morissetUnsubmitted Done Reply Inline Actions Why ? morisset: Why ?