This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
docs/
3
LangRef.rst
-
lib/
-
CodeGen/
12
AtomicExpandPass.cpp
-
IR/
-
Verifier.cpp
-
test/
-
CodeGen/X86/
-
X86/
13
atomic-non-integer.ll
-
Transforms/AtomicExpand/X86/
-
AtomicExpand/
-
X86/
1
expand-atomic-non-integer.ll
-
Verifier/
2
atomics.ll

Differential D15471

[IR] Add support for floating pointer atomic loads and stores
ClosedPublic

Authored by reames on Dec 11 2015, 4:15 PM.

Download Raw Diff

Details

Reviewers

jyknight
jfb
hfinkel

Commits

rG61a24ab6cc30: [IR] Add support for floating pointer atomic loads and stores
rL255737: [IR] Add support for floating pointer atomic loads and stores

Summary

This patch allows atomic loads and stores of floating point to be specified in the IR and adds an adapter to allow them to be lowered via existing backend support for bitcast-to-equivalent-integer idiom.

Previously, the only way to specify a atomic float operation was to bitcast the pointer to a i32, load the value as an i32, then bitcast to a float. At it's most basic, this patch simply moves this expansion step to the point we start lowering to the backend.

This patch does not add canonicalization rules to convert the bitcast idioms to the appropriate atomic loads. I plan to do that in the future, but for now, let's simply add the support. I'd like to get instruction selection working through at least one backend (x86-64) without the bitcast conversion before canonicalizing into this form.

Similarly, I haven't yet added the target hooks to opt out of the lowering step I added to AtomicExpand. I figured it would more sense to add those once at least one backend (x86) was ready to actually opt out.

As you can see from the included tests, the generated code quality is not great. I plan on submitting some patches to fix this, but help from others along that line would be very welcome. I'm not super familiar with the backend and my ramp up time may be material.

Diff Detail

Event Timeline

reames updated this revision to Diff 42599.Dec 11 2015, 4:15 PM

reames retitled this revision from to [IR] Add support for floating pointer and vector atomic loads and stores.

reames updated this object.

reames added reviewers: jfb, hfinkel, jyknight.

reames added a subscriber: llvm-commits.

Minor drop by comments inline.

docs/LangRef.rst
6836–6837	Minor & optional language suggestion: right now this can be read as "is >= 2^8". Also, I'm not sure if the "greater than or equal to a target specific limit" is better than "greater than a target specific limit". Might be clearer to phrase this as: ... be a type whose bit width is a power of two, is not less than 8, and is greater than a target specific limit ...
lib/CodeGen/AtomicExpandPass.cpp
199	Why not have `IntegerType *` as the return type?
209	`F` is a `llvm::Module`, so please call it `M`. Also, there is an `Instruction::getModule`.
218	Very minor: spacing before `=`.
294	Again, I think `auto *M = SI->getModule()` is better.

Address Sanjoy's review.

Does this have any implications for alias analysis, especially type-based? The C++ model means memory locations are always atomic, so this isn't un-aliasing non-atomics further, but does LLVM's AAs understand that atomics can be non-integral now?

As explained in one of my comments, I think this patch should only do FP and not vectors.

docs/LangRef.rst
6837	This isn't correct because it can't be any type: it has to be an integer, FP or vector. I'm not sure I'm really enthused by the addition of vector here because it has odd ramifications: What if the vector's size isn't a power of 2? What about vector atomicity and alignment? Some usecases may be OK with element-wise atomicity only (with undefined ordering). What if the vector contains pointers? Right now you can't have atomic load/store of pointers, it seems odd to allow vectors of pointers. I'd drop vectors from this patch, and clarify the documentation to mention integer and floating-point.
lib/CodeGen/AtomicExpandPass.cpp
142	Add a string: `assert(foo && "bar");`
153	Ditto.
202	Shouldn't this be `getStoreSizeInBits()`? If `getStoreSizeInBits() != getSizeInBits()` then we'll have problems because the alignment may be wrong. I'd assert that they're the same, as well as checking the number is a power of 2 (because lulz fp80).

In D15471#309318, @jfb wrote:

Does this have any implications for alias analysis, especially type-based? The C++ model means memory locations are always atomic, so this isn't un-aliasing non-atomics further, but does LLVM's AAs understand that atomics can be non-integral now?

I'm not sure what you're trying to ask here specifically, but I have no reason to believe this influences AA in any way. Any AA which is using the *llvm type* to prove no alias is wrong and should be fixed. TBAA is entirely orthogonal and uneffected.

As explained in one of my comments, I think this patch should only do FP and not vectors.

I'm okay with this for moment. Will upload a simplified patch shortly.

reames added inline comments.Dec 14 2015, 1:05 PM

docs/LangRef.rst
6837	Let's move this to the llvm-dev thread. For the record, your point 3 is based on a wrong assumption. We do support atomic loads and stores of pointers.
lib/CodeGen/AtomicExpandPass.cpp
142	Will do. For the record, I feel the message adds absolutely nothing here given the code context, but I don't care enough to argue the point.
202	I'll switch methods and add the first assert since it's slightly non-obvious. The power of two is enforced by the verifier.

Address JF's comments and remove the vector support for the moment.

Forgot the new LangRef changes in the last update.

bcraig added a subscriber: bcraig.Dec 14 2015, 1:25 PM

bcraig added inline comments.

test/CodeGen/X86/atomic-non-integer.ll
50	All of your llc tests are currently testing unordered accesses. The interesting code gen on X86 is with seq_cst stores. I recommend adding tests for those, and ensuring that you get the appropriate [lock] xchg operations.

Add a couple of seq_cst test per Ben's request.

LGTM... but that doesn't mean much. Thanks for adding the extra tests.

For x86, can you add a negative test for fp80 to make sure it doesn't work? Or will that test only hit an assert? I'm assuming that it should fail validation.

Also, add fp128. For now I'm assuming we'll do lock cmpxchg16b on processors which support it, and a call to the runtime lock from compiler-rt otherwise (__atomic_load_16 and __atomic_store_16 from lib/builtins/atomic.c which I'm not sure work?).

lib/CodeGen/AtomicExpandPass.cpp
207	"integral"
284	"integral"
test/CodeGen/X86/atomic-non-integer.ll
2	Add a reference to: Also add a reference to: https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html
7	Why does this involve an f2h conversion? Is it a calling convention thing where half is passed as its integer equivalent? It would be worth documenting, I find it surprising.
33	Same.
55	For each of the `xchg` below, can you also `CHECK-NOT: lock` since the x86 manual states that the `lock` prefix is implicit.
test/Transforms/AtomicExpand/X86/expand-atomic-non-integer.ll
6	Also test `volatile` and `addressspace` combinations.

In D15471#311041, @jfb wrote:

For x86, can you add a negative test for fp80 to make sure it doesn't work? Or will that test only hit an assert? I'm assuming that it should fail validation.

This will be caught by the verifier. Will add a test.

Also, add fp128. For now I'm assuming we'll do lock cmpxchg16b on processors which support it, and a call to the runtime lock from compiler-rt otherwise (__atomic_load_16 and __atomic_store_16 from lib/builtins/atomic.c which I'm not sure work?).

I haven't checked, but the details of the lowering (including correctness!) are beyond the scope of this change. The lowering will be whatever you would have gotten with the previous bitcast idiom. *All* this change is doing is removing the need for that idiom. If we need to revise lowering, that should and will be a separate change. I will add a test to check that we lower to *something*, but that's all that should be part of this change.

I'll make the testing changes requested. With the assumption those will be done before submission, can I get an official LGTM?

I'm noticing a trend for this patch to keep snowballing to include more and more lowering questions; I specifically want to cut that off. This change *shouldn't* be changing lowering in any way from what frontends get today. That's the entire point of the patch.

lib/CodeGen/AtomicExpandPass.cpp
284	Will fix.
test/CodeGen/X86/atomic-non-integer.ll
2	This is, or should, be documented in the code. Since I didn't write the lowering code for this and don't know what reasoning let to this particular emission, I'd rather not falsely site something due to the later confusion it might cause.
7	Frankly, I have no clue. We emit the same conversion when doing a non-atomic store, so it's not related to the changes in this patch.
55	This is an implementation detail that is irrelevant to this functionality. This is a) tested elsewhere, and b) irrelevant to the correctness of the code genation here.

In D15471#311072, @reames wrote:

In D15471#311041, @jfb wrote:

For x86, can you add a negative test for fp80 to make sure it doesn't work? Or will that test only hit an assert? I'm assuming that it should fail validation.

This will be caught by the verifier. Will add a test.

OK ty.

Also, add fp128. For now I'm assuming we'll do lock cmpxchg16b on processors which support it, and a call to the runtime lock from compiler-rt otherwise (__atomic_load_16 and __atomic_store_16 from lib/builtins/atomic.c which I'm not sure work?).

I haven't checked, but the details of the lowering (including correctness!) are beyond the scope of this change. The lowering will be whatever you would have gotten with the previous bitcast idiom. *All* this change is doing is removing the need for that idiom. If we need to revise lowering, that should and will be a separate change. I will add a test to check that we lower to *something*, but that's all that should be part of this change.

Oh yeah I just want to know that we generate something. If it's wrong the fix should definitely be separate.

I'll make the testing changes requested. With the assumption those will be done before submission, can I get an official LGTM?

Yes.

I'm noticing a trend for this patch to keep snowballing to include more and more lowering questions; I specifically want to cut that off. This change *shouldn't* be changing lowering in any way from what frontends get today. That's the entire point of the patch.

I want to improve our test coverage and understand what the tests are doing. I agree that lowering improvements are separate.

test/CodeGen/X86/atomic-non-integer.ll
2	Could you document this at the top of the test? It's not clear from the name of the test that you're not checking correctness of the generated code.
7	I'd rather know if the test is correct, and fix elsewhere if not: otherwise it's hard to reason about this test when fixing bugs. I can't find anything about `half` or `fp16` in the calling convention, but the signature (`short __gnu_f2h_ieee(float)`) leads me to believe that the ABI passes things as `float` and then converts to `short` as a proxy for `half`. I think a cleaner test wouldn't pass a `half` in registers but would instead pass two pointers. That makes the test way easier to understand IMO.
55	Oh you're right, `test/CodeGen/X86/atomic_mi.ll` tests this and `git blame` shows a familiar name on that!

add various requested tests

address JF's comments with regards to halfs.

lgtm after the two "integral" typo fixes.

test/Verifier/atomics.ll
4	Ha, that's not the best error message since `x86_mmx` is a floating-point type. I'll rework it in D15512 if that's OK with you.

This revision is now accepted and ready to land.Dec 15 2015, 4:15 PM

reames added inline comments.Dec 15 2015, 4:48 PM

test/CodeGen/X86/atomic-non-integer.ll
2	I added a note to the top of the file. Let me know if you want something more.
7	This comment doesn't really parse for me. If you want to suggest a change here, please raise it on the mailing lists so that someone more knowledgeable than I can comment.
test/Verifier/atomics.ll
4	Not according to LLVM's isFloatingPointTy it's not. Surprised me too. Fixing that might end up being a much larger change though.

Closed by commit rL255737: [IR] Add support for floating pointer atomic loads and stores (authored by reames). · Explain WhyDec 15 2015, 4:52 PM

This revision was automatically updated to reflect the committed changes.

jfb mentioned this in D85900: [mlir] do not use llvm.cmpxchg with floats.Aug 13 2020, 3:00 PM

danilaml mentioned this in D60394: [X86] Add patterns for using movss/movsd for atomic load/store of f32/64. Remove atomic fadd pseudos use isel patterns instead..Dec 30 2021, 4:55 AM

Revision Contents

Path

Size

docs/

LangRef.rst

25 lines

lib/

CodeGen/

AtomicExpandPass.cpp

96 lines

IR/

Verifier.cpp

8 lines

test/

CodeGen/

X86/

atomic-non-integer.ll

108 lines

Transforms/

AtomicExpand/

X86/

expand-atomic-non-integer.ll

82 lines

Verifier/

atomics.ll

14 lines

Diff 42920

docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 6,827 Lines • ▼ Show 20 Lines
	then the optimizer is not allowed to modify the number or order of			then the optimizer is not allowed to modify the number or order of
	execution of this ``load`` with other :ref:`volatile			execution of this ``load`` with other :ref:`volatile
	operations <volatile>`.			operations <volatile>`.

	If the ``load`` is marked as ``atomic``, it takes an extra			If the ``load`` is marked as ``atomic``, it takes an extra
	:ref:`ordering <ordering>` and optional ``singlethread`` argument. The			:ref:`ordering <ordering>` and optional ``singlethread`` argument. The
	``release`` and ``acq_rel`` orderings are not valid on ``load``			``release`` and ``acq_rel`` orderings are not valid on ``load``
	instructions. Atomic loads produce :ref:`defined <memmodel>` results			instructions. Atomic loads produce :ref:`defined <memmodel>` results
	when they may see multiple atomic stores. The type of the pointee must			when they may see multiple atomic stores. The type of the pointee must
	be an integer type whose bit width is a power of two greater than or			be an integer or floating point type whose bit width is a power of two,
				sanjoyUnsubmitted Not Done Reply Inline Actions Minor & optional language suggestion: right now this can be read as "is >= 2^8". Also, I'm not sure if the "greater than or equal to a target specific limit" is better than "greater than a target specific limit". Might be clearer to phrase this as: ... be a type whose bit width is a power of two, is not less than 8, and is greater than a target specific limit ... sanjoy: Minor & optional language suggestion: right now this can be read as "is >= 2^8". Also, I'm…
				jfbUnsubmitted Not Done Reply Inline Actions This isn't correct because it can't be any type: it has to be an integer, FP or vector. I'm not sure I'm really enthused by the addition of vector here because it has odd ramifications: What if the vector's size isn't a power of 2? What about vector atomicity and alignment? Some usecases may be OK with element-wise atomicity only (with undefined ordering). What if the vector contains pointers? Right now you can't have atomic load/store of pointers, it seems odd to allow vectors of pointers. I'd drop vectors from this patch, and clarify the documentation to mention integer and floating-point. jfb: This isn't correct because it can't be any type: it has to be an integer, FP or vector. I'm not…
				reamesAuthorUnsubmitted Not Done Reply Inline Actions Let's move this to the llvm-dev thread. For the record, your point 3 is based on a wrong assumption. We do support atomic loads and stores of pointers. reames: Let's move this to the llvm-dev thread. For the record, your point 3 is based on a wrong…
	equal to eight and less than or equal to a target-specific size limit.			greater than or equal to eight, and less than or equal to a
	``align`` must be explicitly specified on atomic loads, and the load has			target-specific size limit. ``align`` must be explicitly specified on
	undefined behavior if the alignment is not set to a value which is at			atomic loads, and the load has undefined behavior if the alignment is
	least the size in bytes of the pointee. ``!nontemporal`` does not have			not set to a value which is at least the size in bytes of the pointee.
	any defined semantics for atomic loads.			``!nontemporal`` does not have any defined semantics for atomic loads.

	The optional constant ``align`` argument specifies the alignment of the			The optional constant ``align`` argument specifies the alignment of the
	operation (that is, the alignment of the memory address). A value of 0			operation (that is, the alignment of the memory address). A value of 0
	or an omitted ``align`` argument means that the operation has the ABI			or an omitted ``align`` argument means that the operation has the ABI
	alignment for the target. It is the responsibility of the code emitter			alignment for the target. It is the responsibility of the code emitter
	to ensure that the alignment information is correct. Overestimating the			to ensure that the alignment information is correct. Overestimating the
	alignment results in undefined behavior. Underestimating the alignment			alignment results in undefined behavior. Underestimating the alignment
	may produce less efficient code. An alignment of 1 is always safe. The			may produce less efficient code. An alignment of 1 is always safe. The
	▲ Show 20 Lines • Show All 103 Lines • ▼ Show 20 Lines
	execution of this ``store`` with other :ref:`volatile			execution of this ``store`` with other :ref:`volatile
	operations <volatile>`.			operations <volatile>`.

	If the ``store`` is marked as ``atomic``, it takes an extra			If the ``store`` is marked as ``atomic``, it takes an extra
	:ref:`ordering <ordering>` and optional ``singlethread`` argument. The			:ref:`ordering <ordering>` and optional ``singlethread`` argument. The
	``acquire`` and ``acq_rel`` orderings aren't valid on ``store``			``acquire`` and ``acq_rel`` orderings aren't valid on ``store``
	instructions. Atomic loads produce :ref:`defined <memmodel>` results			instructions. Atomic loads produce :ref:`defined <memmodel>` results
	when they may see multiple atomic stores. The type of the pointee must			when they may see multiple atomic stores. The type of the pointee must
	be an integer type whose bit width is a power of two greater than or			be an integer or floating point type whose bit width is a power of two,
	equal to eight and less than or equal to a target-specific size limit.			greater than or equal to eight, and less than or equal to a
	``align`` must be explicitly specified on atomic stores, and the store			target-specific size limit. ``align`` must be explicitly specified
	has undefined behavior if the alignment is not set to a value which is			on atomic stores, and the store has undefined behavior if the alignment
	at least the size in bytes of the pointee. ``!nontemporal`` does not			is not set to a value which is at least the size in bytes of the
	have any defined semantics for atomic stores.			pointee. ``!nontemporal`` does not have any defined semantics for
				atomic stores.

	The optional constant ``align`` argument specifies the alignment of the			The optional constant ``align`` argument specifies the alignment of the
	operation (that is, the alignment of the memory address). A value of 0			operation (that is, the alignment of the memory address). A value of 0
	or an omitted ``align`` argument means that the operation has the ABI			or an omitted ``align`` argument means that the operation has the ABI
	alignment for the target. It is the responsibility of the code emitter			alignment for the target. It is the responsibility of the code emitter
	to ensure that the alignment information is correct. Overestimating the			to ensure that the alignment information is correct. Overestimating the
	alignment results in undefined behavior. Underestimating the			alignment results in undefined behavior. Underestimating the
	alignment may produce less efficient code. An alignment of 1 is always			alignment may produce less efficient code. An alignment of 1 is always
	▲ Show 20 Lines • Show All 5,083 Lines • Show Last 20 Lines

lib/CodeGen/AtomicExpandPass.cpp

//===-- AtomicExpandPass.cpp - Expand atomic instructions -------===//		//===-- AtomicExpandPass.cpp - Expand atomic instructions -------===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file contains a pass (at IR level) to replace atomic instructions with		// This file contains a pass (at IR level) to replace atomic instructions with
// either (intrinsic-based) load-linked/store-conditional loops or		// target specific instruction which implement the same semantics in a way
// AtomicCmpXchg.		// which better fits the target backend. This can include the use of either
		// (intrinsic-based) load-linked/store-conditional loops, AtomicCmpXchg, or
		// type coercions.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/CodeGen/AtomicExpandUtils.h"		#include "llvm/CodeGen/AtomicExpandUtils.h"
#include "llvm/CodeGen/Passes.h"		#include "llvm/CodeGen/Passes.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/InstIterator.h"		#include "llvm/IR/InstIterator.h"
Show All 20 Lines	explicit AtomicExpand(const TargetMachine *TM = nullptr)
initializeAtomicExpandPass(*PassRegistry::getPassRegistry());		initializeAtomicExpandPass(*PassRegistry::getPassRegistry());
}		}

bool runOnFunction(Function &F) override;		bool runOnFunction(Function &F) override;

private:		private:
bool bracketInstWithFences(Instruction *I, AtomicOrdering Order,		bool bracketInstWithFences(Instruction *I, AtomicOrdering Order,
bool IsStore, bool IsLoad);		bool IsStore, bool IsLoad);
		IntegerType getCorrespondingIntegerType(Type T, const DataLayout &DL);
		LoadInst convertAtomicLoadToIntegerType(LoadInst LI);
bool tryExpandAtomicLoad(LoadInst *LI);		bool tryExpandAtomicLoad(LoadInst *LI);
bool expandAtomicLoadToLL(LoadInst *LI);		bool expandAtomicLoadToLL(LoadInst *LI);
bool expandAtomicLoadToCmpXchg(LoadInst *LI);		bool expandAtomicLoadToCmpXchg(LoadInst *LI);
		StoreInst convertAtomicStoreToIntegerType(StoreInst SI);
bool expandAtomicStore(StoreInst *SI);		bool expandAtomicStore(StoreInst *SI);
bool tryExpandAtomicRMW(AtomicRMWInst *AI);		bool tryExpandAtomicRMW(AtomicRMWInst *AI);
bool expandAtomicOpToLLSC(		bool expandAtomicOpToLLSC(
Instruction I, Value Addr, AtomicOrdering MemOpOrder,		Instruction I, Value Addr, AtomicOrdering MemOpOrder,
std::function<Value (IRBuilder<> &, Value )> PerformOp);		std::function<Value (IRBuilder<> &, Value )> PerformOp);
bool expandAtomicCmpXchg(AtomicCmpXchgInst *CI);		bool expandAtomicCmpXchg(AtomicCmpXchgInst *CI);
bool isIdempotentRMW(AtomicRMWInst *AI);		bool isIdempotentRMW(AtomicRMWInst *AI);
bool simplifyIdempotentRMW(AtomicRMWInst *AI);		bool simplifyIdempotentRMW(AtomicRMWInst *AI);
▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	if (TLI->getInsertFencesForAtomic()) {
}		}

if (FenceOrdering != Monotonic) {		if (FenceOrdering != Monotonic) {
MadeChange \|= bracketInstWithFences(I, FenceOrdering, IsStore, IsLoad);		MadeChange \|= bracketInstWithFences(I, FenceOrdering, IsStore, IsLoad);
}		}
}		}

if (LI) {		if (LI) {
		if (LI->getType()->isFloatingPointTy()) {
		// TODO: add a TLI hook to control this so that each target can
		// convert to lowering the original type one at a time.
		LI = convertAtomicLoadToIntegerType(LI);
		assert(LI->getType()->isIntegerTy() && "invariant broken");
		jfbUnsubmitted Not Done Reply Inline Actions Add a string: `assert(foo && "bar");` jfb: Add a string: `assert(foo && "bar");`
		reamesAuthorUnsubmitted Not Done Reply Inline Actions Will do. For the record, I feel the message adds absolutely nothing here given the code context, but I don't care enough to argue the point. reames: Will do. For the record, I feel the message adds absolutely nothing here given the code…
		MadeChange = true;
		}

MadeChange \|= tryExpandAtomicLoad(LI);		MadeChange \|= tryExpandAtomicLoad(LI);
} else if (SI && TLI->shouldExpandAtomicStoreInIR(SI)) {		} else if (SI) {
		if (SI->getValueOperand()->getType()->isFloatingPointTy()) {
		// TODO: add a TLI hook to control this so that each target can
		// convert to lowering the original type one at a time.
		SI = convertAtomicStoreToIntegerType(SI);
		assert(SI->getValueOperand()->getType()->isIntegerTy() &&
		"invariant broken");
		jfbUnsubmitted Not Done Reply Inline Actions Ditto. jfb: Ditto.
		MadeChange = true;
		}

		if (TLI->shouldExpandAtomicStoreInIR(SI))
MadeChange \|= expandAtomicStore(SI);		MadeChange \|= expandAtomicStore(SI);
} else if (RMWI) {		} else if (RMWI) {
// There are two different ways of expanding RMW instructions:		// There are two different ways of expanding RMW instructions:
// - into a load if it is idempotent		// - into a load if it is idempotent
// - into a Cmpxchg/LL-SC loop otherwise		// - into a Cmpxchg/LL-SC loop otherwise
// we try them in that order.		// we try them in that order.

if (isIdempotentRMW(RMWI) && simplifyIdempotentRMW(RMWI)) {		if (isIdempotentRMW(RMWI) && simplifyIdempotentRMW(RMWI)) {
MadeChange = true;		MadeChange = true;
Show All 23 Lines	bool AtomicExpand::bracketInstWithFences(Instruction *I, AtomicOrdering Order,
if (TrailingFence) {		if (TrailingFence) {
TrailingFence->removeFromParent();		TrailingFence->removeFromParent();
TrailingFence->insertAfter(I);		TrailingFence->insertAfter(I);
}		}

return (LeadingFence \|\| TrailingFence);		return (LeadingFence \|\| TrailingFence);
}		}

		/// Get the iX type with the same bitwidth as T.
		IntegerType AtomicExpand::getCorrespondingIntegerType(Type T,
		sanjoyUnsubmitted Not Done Reply Inline Actions Why not have `IntegerType ` as the return type? sanjoy:* Why not have `IntegerType *` as the return type?
		const DataLayout &DL) {
		EVT VT = TLI->getValueType(DL, T);
		unsigned BitWidth = VT.getStoreSizeInBits();
		jfbUnsubmitted Not Done Reply Inline Actions Shouldn't this be `getStoreSizeInBits()`? If `getStoreSizeInBits() != getSizeInBits()` then we'll have problems because the alignment may be wrong. I'd assert that they're the same, as well as checking the number is a power of 2 (because lulz fp80). jfb: Shouldn't this be `getStoreSizeInBits()`? If `getStoreSizeInBits() != getSizeInBits()` then…
		reamesAuthorUnsubmitted Not Done Reply Inline Actions I'll switch methods and add the first assert since it's slightly non-obvious. The power of two is enforced by the verifier. reames: I'll switch methods and add the first assert since it's slightly non-obvious. The power of two…
		assert(BitWidth == VT.getSizeInBits() && "must be a power of two");
		return IntegerType::get(T->getContext(), BitWidth);
		}

		/// Convert an atomic load of a non-integeral type to an integer load of the
		jfbUnsubmitted Not Done Reply Inline Actions "integral" jfb: "integral"
		/// equivelent bitwidth. See the function comment on
		/// convertAtomicStoreToIntegerType for background.
		sanjoyUnsubmitted Not Done Reply Inline Actions `F` is a `llvm::Module`, so please call it `M`. Also, there is an `Instruction::getModule`. sanjoy: `F` is a `llvm::Module`, so please call it `M`. Also, there is an `Instruction::getModule`.
		LoadInst AtomicExpand::convertAtomicLoadToIntegerType(LoadInst LI) {
		auto *M = LI->getModule();
		Type *NewTy = getCorrespondingIntegerType(LI->getType(),
		M->getDataLayout());

		IRBuilder<> Builder(LI);

		Value *Addr = LI->getPointerOperand();
		Type *PT = PointerType::get(NewTy,
		sanjoyUnsubmitted Not Done Reply Inline Actions Very minor: spacing before `=`. sanjoy: Very minor: spacing before `=`.
		Addr->getType()->getPointerAddressSpace());
		Value *NewAddr = Builder.CreateBitCast(Addr, PT);

		auto *NewLI = Builder.CreateLoad(NewAddr);
		NewLI->setAlignment(LI->getAlignment());
		NewLI->setVolatile(LI->isVolatile());
		NewLI->setAtomic(LI->getOrdering(), LI->getSynchScope());
		DEBUG(dbgs() << "Replaced " << LI << " with " << NewLI << "\n");

		Value *NewVal = Builder.CreateBitCast(NewLI, LI->getType());
		LI->replaceAllUsesWith(NewVal);
		LI->eraseFromParent();
		return NewLI;
		}

bool AtomicExpand::tryExpandAtomicLoad(LoadInst *LI) {		bool AtomicExpand::tryExpandAtomicLoad(LoadInst *LI) {
switch (TLI->shouldExpandAtomicLoadInIR(LI)) {		switch (TLI->shouldExpandAtomicLoadInIR(LI)) {
case TargetLoweringBase::AtomicExpansionKind::None:		case TargetLoweringBase::AtomicExpansionKind::None:
return false;		return false;
case TargetLoweringBase::AtomicExpansionKind::LLSC:		case TargetLoweringBase::AtomicExpansionKind::LLSC:
return expandAtomicOpToLLSC(		return expandAtomicOpToLLSC(
LI, LI->getPointerOperand(), LI->getOrdering(),		LI, LI->getPointerOperand(), LI->getOrdering(),
[](IRBuilder<> &Builder, Value *Loaded) { return Loaded; });		[](IRBuilder<> &Builder, Value *Loaded) { return Loaded; });
Show All 34 Lines	bool AtomicExpand::expandAtomicLoadToCmpXchg(LoadInst *LI) {
Value *Loaded = Builder.CreateExtractValue(Pair, 0, "loaded");		Value *Loaded = Builder.CreateExtractValue(Pair, 0, "loaded");

LI->replaceAllUsesWith(Loaded);		LI->replaceAllUsesWith(Loaded);
LI->eraseFromParent();		LI->eraseFromParent();

return true;		return true;
}		}

		/// Convert an atomic store of a non-integeral type to an integer store of the
		jfbUnsubmitted Not Done Reply Inline Actions "integral" jfb: "integral"
		reamesAuthorUnsubmitted Not Done Reply Inline Actions Will fix. reames: Will fix.
		/// equivelent bitwidth. We used to not support floating point or vector
		/// atomics in the IR at all. The backends learned to deal with the bitcast
		/// idiom because that was the only way of expressing the notion of a atomic
		/// float or vector store. The long term plan is to teach each backend to
		/// instruction select from the original atomic store, but as a migration
		/// mechanism, we convert back to the old format which the backends understand.
		/// Each backend will need individual work to recognize the new format.
		StoreInst AtomicExpand::convertAtomicStoreToIntegerType(StoreInst SI) {
		IRBuilder<> Builder(SI);
		auto *M = SI->getModule();
		sanjoyUnsubmitted Not Done Reply Inline Actions Again, I think `auto M = SI->getModule()` is better. sanjoy:* Again, I think `auto *M = SI->getModule()` is better.
		Type *NewTy = getCorrespondingIntegerType(SI->getValueOperand()->getType(),
		M->getDataLayout());
		Value *NewVal = Builder.CreateBitCast(SI->getValueOperand(), NewTy);

		Value *Addr = SI->getPointerOperand();
		Type *PT = PointerType::get(NewTy,
		Addr->getType()->getPointerAddressSpace());
		Value *NewAddr = Builder.CreateBitCast(Addr, PT);

		StoreInst *NewSI = Builder.CreateStore(NewVal, NewAddr);
		NewSI->setAlignment(SI->getAlignment());
		NewSI->setVolatile(SI->isVolatile());
		NewSI->setAtomic(SI->getOrdering(), SI->getSynchScope());
		DEBUG(dbgs() << "Replaced " << SI << " with " << NewSI << "\n");
		SI->eraseFromParent();
		return NewSI;
		}

bool AtomicExpand::expandAtomicStore(StoreInst *SI) {		bool AtomicExpand::expandAtomicStore(StoreInst *SI) {
// This function is only called on atomic stores that are too large to be		// This function is only called on atomic stores that are too large to be
// atomic if implemented as a native store. So we replace them by an		// atomic if implemented as a native store. So we replace them by an
// atomic swap, that can be implemented for example as a ldrex/strex on ARM		// atomic swap, that can be implemented for example as a ldrex/strex on ARM
// or lock cmpxchg8/16b on X86, as these are atomic for larger sizes.		// or lock cmpxchg8/16b on X86, as these are atomic for larger sizes.
// It is the responsibility of the target to only signal expansion via		// It is the responsibility of the target to only signal expansion via
// shouldExpandAtomicRMW in cases where this is required and possible.		// shouldExpandAtomicRMW in cases where this is required and possible.
IRBuilder<> Builder(SI);		IRBuilder<> Builder(SI);
▲ Show 20 Lines • Show All 369 Lines • Show Last 20 Lines

lib/IR/Verifier.cpp

Show First 20 Lines • Show All 2,714 Lines • ▼ Show 20 Lines	void Verifier::visitLoadInst(LoadInst &LI) {
Assert(LI.getAlignment() <= Value::MaximumAlignment,		Assert(LI.getAlignment() <= Value::MaximumAlignment,
"huge alignment values are unsupported", &LI);		"huge alignment values are unsupported", &LI);
if (LI.isAtomic()) {		if (LI.isAtomic()) {
Assert(LI.getOrdering() != Release && LI.getOrdering() != AcquireRelease,		Assert(LI.getOrdering() != Release && LI.getOrdering() != AcquireRelease,
"Load cannot have Release ordering", &LI);		"Load cannot have Release ordering", &LI);
Assert(LI.getAlignment() != 0,		Assert(LI.getAlignment() != 0,
"Atomic load must specify explicit alignment", &LI);		"Atomic load must specify explicit alignment", &LI);
if (!ElTy->isPointerTy()) {		if (!ElTy->isPointerTy()) {
Assert(ElTy->isIntegerTy(), "atomic load operand must have integer type!",		Assert(ElTy->isIntegerTy() \|\| ElTy->isFloatingPointTy(),
		"atomic load operand must have integer or floating point type!",
&LI, ElTy);		&LI, ElTy);
unsigned Size = ElTy->getPrimitiveSizeInBits();		unsigned Size = ElTy->getPrimitiveSizeInBits();
Assert(Size >= 8 && !(Size & (Size - 1)),		Assert(Size >= 8 && !(Size & (Size - 1)),
"atomic load operand must be power-of-two byte-sized integer", &LI,		"atomic load operand must be power-of-two byte-sized integer", &LI,
ElTy);		ElTy);
}		}
} else {		} else {
Assert(LI.getSynchScope() == CrossThread,		Assert(LI.getSynchScope() == CrossThread,
Show All 12 Lines	void Verifier::visitStoreInst(StoreInst &SI) {
Assert(SI.getAlignment() <= Value::MaximumAlignment,		Assert(SI.getAlignment() <= Value::MaximumAlignment,
"huge alignment values are unsupported", &SI);		"huge alignment values are unsupported", &SI);
if (SI.isAtomic()) {		if (SI.isAtomic()) {
Assert(SI.getOrdering() != Acquire && SI.getOrdering() != AcquireRelease,		Assert(SI.getOrdering() != Acquire && SI.getOrdering() != AcquireRelease,
"Store cannot have Acquire ordering", &SI);		"Store cannot have Acquire ordering", &SI);
Assert(SI.getAlignment() != 0,		Assert(SI.getAlignment() != 0,
"Atomic store must specify explicit alignment", &SI);		"Atomic store must specify explicit alignment", &SI);
if (!ElTy->isPointerTy()) {		if (!ElTy->isPointerTy()) {
Assert(ElTy->isIntegerTy(),		Assert(ElTy->isIntegerTy() \|\| ElTy->isFloatingPointTy(),
"atomic store operand must have integer type!", &SI, ElTy);		"atomic store operand must have integer or floating point type!",
		&SI, ElTy);
unsigned Size = ElTy->getPrimitiveSizeInBits();		unsigned Size = ElTy->getPrimitiveSizeInBits();
Assert(Size >= 8 && !(Size & (Size - 1)),		Assert(Size >= 8 && !(Size & (Size - 1)),
"atomic store operand must be power-of-two byte-sized integer",		"atomic store operand must be power-of-two byte-sized integer",
&SI, ElTy);		&SI, ElTy);
}		}
} else {		} else {
Assert(SI.getSynchScope() == CrossThread,		Assert(SI.getSynchScope() == CrossThread,
"Non-atomic store cannot have SynchronizationScope specified", &SI);		"Non-atomic store cannot have SynchronizationScope specified", &SI);
▲ Show 20 Lines • Show All 1,233 Lines • Show Last 20 Lines

test/CodeGen/X86/atomic-non-integer.ll

				; RUN: llc < %s -mtriple=x86_64-linux-generic -verify-machineinstrs -mattr=sse2 \| FileCheck %s

				jfbUnsubmitted Not Done Reply Inline Actions Add a reference to: Also add a reference to: https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html jfb: Add a reference to: Also add a reference to: https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.
				reamesAuthorUnsubmitted Not Done Reply Inline Actions This is, or should, be documented in the code. Since I didn't write the lowering code for this and don't know what reasoning let to this particular emission, I'd rather not falsely site something due to the later confusion it might cause. reames: This is, or should, be documented in the code. Since I didn't write the lowering code for this…
				jfbUnsubmitted Not Done Reply Inline Actions Could you document this at the top of the test? It's not clear from the name of the test that you're not checking correctness of the generated code. jfb: Could you document this at the top of the test? It's not clear from the name of the test that…
				reamesAuthorUnsubmitted Not Done Reply Inline Actions I added a note to the top of the file. Let me know if you want something more. reames: I added a note to the top of the file. Let me know if you want something more.
				; Note: This test is testing that the lowering for atomics matches what we
				; currently emit for non-atomics + the atomic restriction. The presence of
				; particular lowering detail in these tests should not be read as requiring
				; that detail for correctness unless it's related to the atomicity itself.
				; (Specifically, there were reviewer questions about the lowering for halfs
				jfbUnsubmitted Not Done Reply Inline Actions Why does this involve an f2h conversion? Is it a calling convention thing where half is passed as its integer equivalent? It would be worth documenting, I find it surprising. jfb: Why does this involve an f2h conversion? Is it a calling convention thing where half is passed…
				reamesAuthorUnsubmitted Not Done Reply Inline Actions Frankly, I have no clue. We emit the same conversion when doing a non-atomic store, so it's not related to the changes in this patch. reames: Frankly, I have no clue. We emit the same conversion when doing a non-atomic store, so it's…
				jfbUnsubmitted Not Done Reply Inline Actions I'd rather know if the test is correct, and fix elsewhere if not: otherwise it's hard to reason about this test when fixing bugs. I can't find anything about `half` or `fp16` in the calling convention, but the signature (`short __gnu_f2h_ieee(float)`) leads me to believe that the ABI passes things as `float` and then converts to `short` as a proxy for `half`. I think a cleaner test wouldn't pass a `half` in registers but would instead pass two pointers. That makes the test way easier to understand IMO. jfb: I'd rather know if the test is correct, and fix elsewhere if not: otherwise it's hard to reason…
				reamesAuthorUnsubmitted Not Done Reply Inline Actions This comment doesn't really parse for me. If you want to suggest a change here, please raise it on the mailing lists so that someone more knowledgeable than I can comment. reames: This comment doesn't really parse for me. If you want to suggest a change here, please raise…
				; and their calling convention which remain unresolved.)

				define void @store_half(half* %fptr, half %v) {
				; CHECK-LABEL: @store_half
				; CHECK: movq %rdi, %rbx
				; CHECK: callq __gnu_f2h_ieee
				; CHECK: movw %ax, (%rbx)
				store atomic half %v, half* %fptr unordered, align 2
				ret void
				}

				define void @store_float(float* %fptr, float %v) {
				; CHECK-LABEL: @store_float
				; CHECK: movd %xmm0, %eax
				; CHECK: movl %eax, (%rdi)
				store atomic float %v, float* %fptr unordered, align 4
				ret void
				}

				define void @store_double(double* %fptr, double %v) {
				; CHECK-LABEL: @store_double
				; CHECK: movd %xmm0, %rax
				; CHECK: movq %rax, (%rdi)
				store atomic double %v, double* %fptr unordered, align 8
				ret void
				}
				jfbUnsubmitted Not Done Reply Inline Actions Same. jfb: Same.

				define void @store_fp128(fp128* %fptr, fp128 %v) {
				; CHECK-LABEL: @store_fp128
				; CHECK: callq __sync_lock_test_and_set_16
				store atomic fp128 %v, fp128* %fptr unordered, align 16
				ret void
				}

				define half @load_half(half* %fptr) {
				; CHECK-LABEL: @load_half
				; CHECK: movw (%rdi), %ax
				; CHECK: movzwl %ax, %edi
				; CHECK: jmp __gnu_h2f_ieee
				%v = load atomic half, half* %fptr unordered, align 2
				ret half %v
				}

				bcraigUnsubmitted Not Done Reply Inline Actions All of your llc tests are currently testing unordered accesses. The interesting code gen on X86 is with seq_cst stores. I recommend adding tests for those, and ensuring that you get the appropriate [lock] xchg operations. bcraig: All of your llc tests are currently testing unordered accesses. The interesting code gen on…
				define float @load_float(float* %fptr) {
				; CHECK-LABEL: @load_float
				; CHECK: movl (%rdi), %eax
				; CHECK: movd %eax, %xmm0
				%v = load atomic float, float* %fptr unordered, align 4
				jfbUnsubmitted Not Done Reply Inline Actions For each of the `xchg` below, can you also `CHECK-NOT: lock` since the x86 manual states that the `lock` prefix is implicit. jfb: For each of the `xchg` below, can you also `CHECK-NOT: lock` since the x86 manual states that…
				reamesAuthorUnsubmitted Not Done Reply Inline Actions This is an implementation detail that is irrelevant to this functionality. This is a) tested elsewhere, and b) irrelevant to the correctness of the code genation here. reames: This is an implementation detail that is irrelevant to this functionality. This is a) tested…
				jfbUnsubmitted Not Done Reply Inline Actions Oh you're right, `test/CodeGen/X86/atomic_mi.ll` tests this and `git blame` shows a familiar name on that! jfb: Oh you're right, `test/CodeGen/X86/atomic_mi.ll` tests this and `git blame` shows a familiar…
				ret float %v
				}

				define double @load_double(double* %fptr) {
				; CHECK-LABEL: @load_double
				; CHECK: movq (%rdi), %rax
				; CHECK: movd %rax, %xmm0
				%v = load atomic double, double* %fptr unordered, align 8
				ret double %v
				}

				define fp128 @load_fp128(fp128* %fptr) {
				; CHECK-LABEL: @load_fp128
				; CHECK: callq __sync_val_compare_and_swap_16
				%v = load atomic fp128, fp128* %fptr unordered, align 16
				ret fp128 %v
				}


				; sanity check the seq_cst lowering since that's the
				; interesting one from an ordering perspective on x86.

				define void @store_float_seq_cst(float* %fptr, float %v) {
				; CHECK-LABEL: @store_float_seq_cst
				; CHECK: movd %xmm0, %eax
				; CHECK: xchgl %eax, (%rdi)
				store atomic float %v, float* %fptr seq_cst, align 4
				ret void
				}

				define void @store_double_seq_cst(double* %fptr, double %v) {
				; CHECK-LABEL: @store_double_seq_cst
				; CHECK: movd %xmm0, %rax
				; CHECK: xchgq %rax, (%rdi)
				store atomic double %v, double* %fptr seq_cst, align 8
				ret void
				}

				define float @load_float_seq_cst(float* %fptr) {
				; CHECK-LABEL: @load_float_seq_cst
				; CHECK: movl (%rdi), %eax
				; CHECK: movd %eax, %xmm0
				%v = load atomic float, float* %fptr seq_cst, align 4
				ret float %v
				}

				define double @load_double_seq_cst(double* %fptr) {
				; CHECK-LABEL: @load_double_seq_cst
				; CHECK: movq (%rdi), %rax
				; CHECK: movd %rax, %xmm0
				%v = load atomic double, double* %fptr seq_cst, align 8
				ret double %v
				}

test/Transforms/AtomicExpand/X86/expand-atomic-non-integer.ll

				; RUN: opt -S %s -atomic-expand -mtriple=x86_64-linux-gnu \| FileCheck %s

				; This file tests the functions `llvm::convertAtomicLoadToIntegerType` and
				; `llvm::convertAtomicStoreToIntegerType`. If X86 stops using this
				; functionality, please move this test to a target which still is.

				jfbUnsubmitted Not Done Reply Inline Actions Also test `volatile` and `addressspace` combinations. jfb: Also test `volatile` and `addressspace` combinations.
				define float @float_load_expand(float* %ptr) {
				; CHECK-LABEL: @float_load_expand
				; CHECK: %1 = bitcast float* %ptr to i32*
				; CHECK: %2 = load atomic i32, i32* %1 unordered, align 4
				; CHECK: %3 = bitcast i32 %2 to float
				; CHECK: ret float %3
				%res = load atomic float, float* %ptr unordered, align 4
				ret float %res
				}

				define float @float_load_expand_seq_cst(float* %ptr) {
				; CHECK-LABEL: @float_load_expand_seq_cst
				; CHECK: %1 = bitcast float* %ptr to i32*
				; CHECK: %2 = load atomic i32, i32* %1 seq_cst, align 4
				; CHECK: %3 = bitcast i32 %2 to float
				; CHECK: ret float %3
				%res = load atomic float, float* %ptr seq_cst, align 4
				ret float %res
				}

				define float @float_load_expand_vol(float* %ptr) {
				; CHECK-LABEL: @float_load_expand_vol
				; CHECK: %1 = bitcast float* %ptr to i32*
				; CHECK: %2 = load atomic volatile i32, i32* %1 unordered, align 4
				; CHECK: %3 = bitcast i32 %2 to float
				; CHECK: ret float %3
				%res = load atomic volatile float, float* %ptr unordered, align 4
				ret float %res
				}

				define float @float_load_expand_addr1(float addrspace(1)* %ptr) {
				; CHECK-LABEL: @float_load_expand_addr1
				; CHECK: %1 = bitcast float addrspace(1)* %ptr to i32 addrspace(1)*
				; CHECK: %2 = load atomic i32, i32 addrspace(1)* %1 unordered, align 4
				; CHECK: %3 = bitcast i32 %2 to float
				; CHECK: ret float %3
				%res = load atomic float, float addrspace(1)* %ptr unordered, align 4
				ret float %res
				}

				define void @float_store_expand(float* %ptr, float %v) {
				; CHECK-LABEL: @float_store_expand
				; CHECK: %1 = bitcast float %v to i32
				; CHECK: %2 = bitcast float* %ptr to i32*
				; CHECK: store atomic i32 %1, i32* %2 unordered, align 4
				store atomic float %v, float* %ptr unordered, align 4
				ret void
				}

				define void @float_store_expand_seq_cst(float* %ptr, float %v) {
				; CHECK-LABEL: @float_store_expand_seq_cst
				; CHECK: %1 = bitcast float %v to i32
				; CHECK: %2 = bitcast float* %ptr to i32*
				; CHECK: store atomic i32 %1, i32* %2 seq_cst, align 4
				store atomic float %v, float* %ptr seq_cst, align 4
				ret void
				}

				define void @float_store_expand_vol(float* %ptr, float %v) {
				; CHECK-LABEL: @float_store_expand_vol
				; CHECK: %1 = bitcast float %v to i32
				; CHECK: %2 = bitcast float* %ptr to i32*
				; CHECK: store atomic volatile i32 %1, i32* %2 unordered, align 4
				store atomic volatile float %v, float* %ptr unordered, align 4
				ret void
				}

				define void @float_store_expand_addr1(float addrspace(1)* %ptr, float %v) {
				; CHECK-LABEL: @float_store_expand_addr1
				; CHECK: %1 = bitcast float %v to i32
				; CHECK: %2 = bitcast float addrspace(1)* %ptr to i32 addrspace(1)*
				; CHECK: store atomic i32 %1, i32 addrspace(1)* %2 unordered, align 4
				store atomic float %v, float addrspace(1)* %ptr unordered, align 4
				ret void
				}

test/Verifier/atomics.ll

				; RUN: not opt -verify < %s 2>&1 \| FileCheck %s

				; CHECK: atomic store operand must have integer or floating point type!
				; CHECK: atomic load operand must have integer or floating point type!
				jfbUnsubmitted Not Done Reply Inline Actions Ha, that's not the best error message since `x86_mmx` is a floating-point type. I'll rework it in D15512 if that's OK with you. jfb: Ha, that's not the best error message since `x86_mmx` is a floating-point type. I'll rework it…
				reamesAuthorUnsubmitted Not Done Reply Inline Actions Not according to LLVM's isFloatingPointTy it's not. Surprised me too. Fixing that might end up being a much larger change though. reames: Not according to LLVM's isFloatingPointTy it's not. Surprised me too. Fixing that might end…

				define void @foo(x86_mmx* %P, x86_mmx %v) {
				store atomic x86_mmx %v, x86_mmx* %P unordered, align 8
				ret void
				}

				define x86_mmx @bar(x86_mmx* %P) {
				%v = load atomic x86_mmx, x86_mmx* %P unordered, align 8
				ret x86_mmx %v
				}