This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
docs/
1/1
LangRef.rst
-
include/llvm/
-
llvm/
-
Bitcode/
-
LLVMBitCodes.h
-
IR/
-
Attributes.td
5/5
InstrTypes.h
-
lib/
-
Bitcode/
-
Reader/
-
BitcodeReader.cpp
-
Writer/
-
BitcodeWriter.cpp
-
Transforms/
-
Scalar/
-
EarlyCSE.cpp
-
GVN.cpp
-
NewGVN.cpp
-
Utils/
-
CodeExtractor.cpp
-
test/Transforms/Coroutines/
-
Transforms/
-
Coroutines/
-
coro-readnone-01.ll
-
coro-readnone-02.ll
-
coro-readnone-03.ll

Differential D132352

Introduce noread_thread_id to address the thread identification problem in coroutines
AbandonedPublic

Authored by ChuanqiXu on Aug 22 2022, 1:12 AM.

Download Raw Diff

Details

Reviewers

nhaehnle
jyknight
nikic
rjmccall
efriedma
fhahn
ychen
jdoerfert
sstefan1

Summary

This implements the suggested solution from @nhaehnle, @jyknight and @fhahn (who prefer this than D127383). The suggested solution is described in: https://discourse.llvm.org/t/address-thread-identification-problems-with-coroutine/62015/48. Here is the referenced solutions:

Introduce new attribute noread_thread_id
Emit noread_thread_id in addition to readnone in the frontend for functions that are known to not read the thread ID implicitly or explicitly.
Mark the TLS intrinsic as only as readnone.
Many checks for hasAttribute(ReadNone) (e.g. to guard CSE) have to become hasAttribute(ReadNone) && (!isCoroutine(CurrentFn) || !hasAttribute(NoReadThreadID). Though, note that this isn’t the first time that happens. For example, we have a bunch of places that need to check hasAttribute(ReadNone) && !hasAttribute(Convergent). So there’s precedent.

And the patch doesn't implement the 2ed and 4th solutions completely, I think we could complete them step by step since we lack enough tests now. The decision may be a little bit painful but I think it would be more stable.

Test Plans: check-all, https://godbolt.org/z/TGcPPP57K, https://godbolt.org/z/TGcPPP57K, folly and our internal projects.

Diff Detail

Event Timeline

ChuanqiXu created this revision.Aug 22 2022, 1:12 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 22 2022, 1:12 AM

Herald added subscribers: jeroen.dobbelaere, wenlei, okura and 4 others. · View Herald Transcript

ChuanqiXu requested review of this revision.Aug 22 2022, 1:12 AM

Herald added a reviewer: jdoerfert. · View Herald TranscriptAug 22 2022, 1:12 AM

Herald added a reviewer: jdoerfert. · View Herald Transcript

Herald added a reviewer: sstefan1. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: llvm-commits, cfe-commits, sstefan1. · View Herald Transcript

Harbormaster completed remote builds in B182519: Diff 454398.Aug 22 2022, 2:36 AM

rjmccall added inline comments.Aug 22 2022, 1:42 PM

llvm/docs/LangRef.rst
1931	Suggestion: This attribute indicates that the function does not rely on the identity of the current thread in any way, such as by reading the current thread ID or taking the address of a thread-local variable. If the function does rely on the identity of the current thread, the behavior is undefined.

rjmccall added inline comments.Aug 22 2022, 1:49 PM

clang/lib/CodeGen/CGStmt.cpp
2260 ↗	(On Diff #454398)	Hmm, my comment here got lost somehow. This looks like a new semantic assumption. Can we split this patch so that this is separate from the introduction of the new attribute? I know we were assuming this until a few weeks ago, but still, it's generally best practice to do representation changes independent from semantic changes. Also, this is basically saying that relying on the thread identity is a side-effect that needs to be reflected in the constraints. Is there documentation justifying that assumption? Doesn't this make some kinds of existing code invalid?

Address comments:

Edit LangRef.rst.
Split the add-the-attributes part in later revisions.

ChuanqiXu marked an inline comment as done.Aug 22 2022, 8:43 PM

Okay. Doc parts LGTM, and I have some naming suggestions for the core methods.

llvm/include/llvm/IR/InstrTypes.h
1855	The closest existing precedent would suggest `doesNotReadThreadID` and `setDoesNotReadThreadID`.
1857
1863	This is an odd use of "nor". Maybe take a different approach — `canReadDifferentThreadIDIfMoved()`?

Harbormaster completed remote builds in B182734: Diff 454683.Aug 22 2022, 9:46 PM

Address comments.

I post the incorrect version the last time.

ChuanqiXu mentioned this in D132434: Add noread_thread_id attribute to intrinsics.Aug 22 2022, 10:50 PM

Harbormaster completed remote builds in B182739: Diff 454697.Aug 23 2022, 12:11 AM

rjmccall added inline comments.Aug 23 2022, 9:25 AM

llvm/include/llvm/IR/InstrTypes.h
1863	Oh, I didn't notice this last night — `canReadDifferentThreadIDIfMoved` has the opposite sense of the old method, so either you need to negate the logic in the method and all its call sites, or you need to rename it something like `cannotReadDifferentThreadIDIfMoved`.

Address comments.

ChuanqiXu marked an inline comment as done.Aug 24 2022, 7:47 AM

ChuanqiXu added inline comments.

llvm/include/llvm/IR/InstrTypes.h
1863	Oh, my bad. I should check that. Thanks for double checking!

Harbormaster completed remote builds in B183106: Diff 455209.Aug 24 2022, 9:06 AM

Seems okay to me, but like I said, it'd be good to get AA eyes on it.

nikic added a child revision: D132434: Add noread_thread_id attribute to intrinsics.Aug 29 2022, 3:47 AM

Okay, this is a bit tricky because we have three different things:

The noread_thread_id attribute, the lack of which was causing issues with intrinsics in the previous version
The meaning of the readnone (etc) attributes, which for pragmatic reasons has to exclude thread IDs for now
The meaning of doesNotReadMemory() etc queries, which in the previous version included thread ID accesses, but in the new version require a separate call

I think my question here would be why this did not stick with the previous implementation approach that also affects doesNotReadMemory and AA queries (and thus makes everything "automatically correct"), and only added the noread_thread_id attribute to make intrinsic handling more precise?

My general vision for this area was that after D130896, we would add ThreadID as an additional ModRef location, which gets removed for non-presplit-coroutines due to being constant. This would follow the interpretation that the thread ID is part of "memory" though, which kind of goes against the approach here.

In D132352#3755355, @nikic wrote:

Okay, this is a bit tricky because we have three different things:

The noread_thread_id attribute, the lack of which was causing issues with intrinsics in the previous version

The meaning of the readnone (etc) attributes, which for pragmatic reasons has to exclude thread IDs for now

The meaning of doesNotReadMemory() etc queries, which in the previous version included thread ID accesses, but in the new version require a separate call

I think my question here would be why this did not stick with the previous implementation approach that also affects doesNotReadMemory and AA queries (and thus makes everything "automatically correct"), and only added the noread_thread_id attribute to make intrinsic handling more precise?

My general vision for this area was that after D130896, we would add ThreadID as an additional ModRef location, which gets removed for non-presplit-coroutines due to being constant. This would follow the interpretation that the thread ID is part of "memory" though, which kind of goes against the approach here.

I think the key point here is whether or not "thread_id" is part of "memory". According to https://discourse.llvm.org/t/address-thread-identification-problems-with-coroutine/62015/48, we agree to treat "thread_id" is not part of memory. I feel the idea is to make these attributes more composable. (@nhaehnle ) And it looks like @jyknight @fhahn @rjmccall @efriedma tend to agree the direction if I don't misread. And your proposed solution should be available too. I think we need to get in consensus that whether or not "thread_id" is part of the "memory".

And another benefit of this method is that it is helpful to solve the potential similar problem in green threads (which is called stackful coroutines, or fibers). We mention about it here: https://discourse.llvm.org/t/address-thread-identification-problems-with-coroutine/62015/28

(Some backgrounds for stackful coroutines: The stackful coroutines are not standard features and a vendor extension. the coroutine intrinsics in LLVM currently works for stackless coroutines. And the general implementation of stackful coroutine is not compiler dependent. The stackful coroutines save each register manually when switching. So it is hard to detect stackful coroutines in compiler.)

Stackful coroutine bodies should be straightforward to support on top of the other work you've been doing, if anyone's actually interested in pursuing them. As far as the optimizer needs to know, a stackful coroutine function is just like a presplit stackless coroutine except that calls and returns work normally and it's never split. Because it's never split, the backends would need to understand that they can't arbitrarily reorder TLS materializations and so on in those functions, which would probably be the most complicated piece of work there. Otherwise, I think we'd just need to mark stackful coroutine bodies with some new attribute and then change cannotReadDifferentThreadIDIfMoved to check for that, the same way it checks for presplit stackless coroutines.

In D132352#3757415, @rjmccall wrote:

Stackful coroutine bodies should be straightforward to support on top of the other work you've been doing, if anyone's actually interested in pursuing them. As far as the optimizer needs to know, a stackful coroutine function is just like a presplit stackless coroutine except that calls and returns work normally and it's never split. Because it's never split, the backends would need to understand that they can't arbitrarily reorder TLS materializations and so on in those functions, which would probably be the most complicated piece of work there. Otherwise, I think we'd just need to mark stackful coroutine bodies with some new attribute and then change cannotReadDifferentThreadIDIfMoved to check for that, the same way it checks for presplit stackless coroutines.

As far as I understand, we can't mark stackful coroutine bodies with special attributes. It is slightly different from stackless coroutine. A stackless coroutine is a suspendable function. So we can mark the function. But the stackful coroutine is a thread in the user space actually. (Or we can think stackful coroutine as a stack instead of a function) In another word, if a function A in a stackful coroutine calls another function B, then B lives in the stackful coroutine too. The stackful coroutine switches by user(library) implemented methods and they are not standardized so we can't even do hacks for them.

I don't pursue stackful coroutine too. I raise the example to show the idea of noread_thread_id may have some slight advantage.

In D132352#3757433, @ChuanqiXu wrote:

In D132352#3757415, @rjmccall wrote:

Stackful coroutine bodies should be straightforward to support on top of the other work you've been doing, if anyone's actually interested in pursuing them. As far as the optimizer needs to know, a stackful coroutine function is just like a presplit stackless coroutine except that calls and returns work normally and it's never split. Because it's never split, the backends would need to understand that they can't arbitrarily reorder TLS materializations and so on in those functions, which would probably be the most complicated piece of work there. Otherwise, I think we'd just need to mark stackful coroutine bodies with some new attribute and then change cannotReadDifferentThreadIDIfMoved to check for that, the same way it checks for presplit stackless coroutines.

As far as I understand, we can't mark stackful coroutine bodies with special attributes. It is slightly different from stackless coroutine. A stackless coroutine is a suspendable function. So we can mark the function. But the stackful coroutine is a thread in the user space actually. (Or we can think stackful coroutine as a stack instead of a function) In another word, if a function A in a stackful coroutine calls another function B, then B lives in the stackful coroutine too. The stackful coroutine switches by user(library) implemented methods and they are not standardized so we can't even do hacks for them.

Well, it's complicated. Stackful coroutines don't generally involve creating a true thread, just a stack, and then yes, you swap the stack on the current thread in some system-specific way. What I'm pointing out is that stackful coroutines can therefore observe changes in their thread ID across suspension points, which is the exact same problem as stackless coroutines have. So we actually *would* need them to be identified specially, even if they're otherwise straightforward to compile, just so that we understand those special semantics and don't e.g. miscompile TLV accesses by hoisting them over a suspension. (That this is still necessary might perhaps undermine some of the arguments for pursuing stackful coroutines in the first place; nonetheless, it's true.)

I don't pursue stackful coroutine too. I raise the example to show the idea of noread_thread_id may have some slight advantage.

Sure, we don't need to talk about this any further, since nobody's interested in pursuing it right now. I'm just trying to underline the connections to what we've already done.

@nhaehnle @jyknight @nikic @efriedma @fhahn ping~

We prefer https://reviews.llvm.org/D135550

ChuanqiXu abandoned this revision.Oct 16 2022, 7:59 PM

Revision Contents

Path

Size

llvm/

docs/

LangRef.rst

7 lines

include/

llvm/

Bitcode/

LLVMBitCodes.h

1 line

IR/

Attributes.td

3 lines

InstrTypes.h

18 lines

lib/

Bitcode/

Reader/

BitcodeReader.cpp

2 lines

Writer/

BitcodeWriter.cpp

2 lines

Transforms/

Scalar/

EarlyCSE.cpp

7 lines

GVN.cpp

14 lines

NewGVN.cpp

18 lines

Utils/

CodeExtractor.cpp

1 line

test/

Transforms/

Coroutines/

coro-readnone-01.ll

89 lines

coro-readnone-02.ll

81 lines

coro-readnone-03.ll

64 lines

Diff 455209

llvm/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,920 Lines • ▼ Show 20 Lines	``"probe-stack"``
the stack probing function that will be called.		the stack probing function that will be called.

If a function that has a ``"probe-stack"`` attribute is inlined into		If a function that has a ``"probe-stack"`` attribute is inlined into
a function with another ``"probe-stack"`` attribute, the resulting		a function with another ``"probe-stack"`` attribute, the resulting
function has the ``"probe-stack"`` attribute of the caller. If a		function has the ``"probe-stack"`` attribute of the caller. If a
function that has a ``"probe-stack"`` attribute is inlined into a		function that has a ``"probe-stack"`` attribute is inlined into a
function that has no ``"probe-stack"`` attribute at all, the resulting		function that has no ``"probe-stack"`` attribute at all, the resulting
function has the ``"probe-stack"`` attribute of the callee.		function has the ``"probe-stack"`` attribute of the callee.
		``noread_thread_id``
		This attribute indicates that the function does not rely on the
		identity of the current thread in any way, such as by reading the
		rjmccallUnsubmitted Done Reply Inline Actions Suggestion: This attribute indicates that the function does not rely on the identity of the current thread in any way, such as by reading the current thread ID or taking the address of a thread-local variable. If the function does rely on the identity of the current thread, the behavior is undefined. rjmccall: Suggestion: > This attribute indicates that the function does not rely on the > identity of…
		current thread ID or taking the address of a thread-local variable.

		If the function does rely on the identity of the current thread,
		the behavior is undefined.
``readnone``		``readnone``
On a function, this attribute indicates that the function computes its		On a function, this attribute indicates that the function computes its
result (or decides to unwind an exception) based strictly on its arguments,		result (or decides to unwind an exception) based strictly on its arguments,
without dereferencing any pointer arguments or otherwise accessing		without dereferencing any pointer arguments or otherwise accessing
any mutable state (e.g. memory, control registers, etc) visible outside the		any mutable state (e.g. memory, control registers, etc) visible outside the
``readnone`` function. It does not write through any pointer arguments		``readnone`` function. It does not write through any pointer arguments
(including ``byval`` arguments) and never changes any state visible to		(including ``byval`` arguments) and never changes any state visible to
callers. This means while it cannot unwind exceptions by calling the ``C++``		callers. This means while it cannot unwind exceptions by calling the ``C++``
▲ Show 20 Lines • Show All 9,991 Lines • Show Last 20 Lines

llvm/include/llvm/Bitcode/LLVMBitCodes.h

Show First 20 Lines • Show All 684 Lines • ▼ Show 20 Lines	enum AttributeKindCodes {
ATTR_KIND_DISABLE_SANITIZER_INSTRUMENTATION = 78,		ATTR_KIND_DISABLE_SANITIZER_INSTRUMENTATION = 78,
ATTR_KIND_NO_SANITIZE_BOUNDS = 79,		ATTR_KIND_NO_SANITIZE_BOUNDS = 79,
ATTR_KIND_ALLOC_ALIGN = 80,		ATTR_KIND_ALLOC_ALIGN = 80,
ATTR_KIND_ALLOCATED_POINTER = 81,		ATTR_KIND_ALLOCATED_POINTER = 81,
ATTR_KIND_ALLOC_KIND = 82,		ATTR_KIND_ALLOC_KIND = 82,
ATTR_KIND_PRESPLIT_COROUTINE = 83,		ATTR_KIND_PRESPLIT_COROUTINE = 83,
ATTR_KIND_FNRETTHUNK_EXTERN = 84,		ATTR_KIND_FNRETTHUNK_EXTERN = 84,
ATTR_KIND_SKIP_PROFILE = 85,		ATTR_KIND_SKIP_PROFILE = 85,
		ATTR_KIND_NO_READ_THREAD_ID = 86,
};		};

enum ComdatSelectionKindCodes {		enum ComdatSelectionKindCodes {
COMDAT_SELECTION_KIND_ANY = 1,		COMDAT_SELECTION_KIND_ANY = 1,
COMDAT_SELECTION_KIND_EXACT_MATCH = 2,		COMDAT_SELECTION_KIND_EXACT_MATCH = 2,
COMDAT_SELECTION_KIND_LARGEST = 3,		COMDAT_SELECTION_KIND_LARGEST = 3,
COMDAT_SELECTION_KIND_NO_DUPLICATES = 4,		COMDAT_SELECTION_KIND_NO_DUPLICATES = 4,
COMDAT_SELECTION_KIND_SAME_SIZE = 5,		COMDAT_SELECTION_KIND_SAME_SIZE = 5,
Show All 14 Lines

llvm/include/llvm/IR/Attributes.td

	Show First 20 Lines • Show All 208 Lines • ▼ Show 20 Lines
	def OptimizeForSize : EnumAttr<"optsize", [FnAttr]>;			def OptimizeForSize : EnumAttr<"optsize", [FnAttr]>;

	/// Function must not be optimized.			/// Function must not be optimized.
	def OptimizeNone : EnumAttr<"optnone", [FnAttr]>;			def OptimizeNone : EnumAttr<"optnone", [FnAttr]>;

	/// Similar to byval but without a copy.			/// Similar to byval but without a copy.
	def Preallocated : TypeAttr<"preallocated", [FnAttr, ParamAttr]>;			def Preallocated : TypeAttr<"preallocated", [FnAttr, ParamAttr]>;

				/// Function does not read thread ID.
				def NoReadThreadID : EnumAttr<"noread_thread_id", [FnAttr]>;

	/// Function does not access memory.			/// Function does not access memory.
	def ReadNone : EnumAttr<"readnone", [FnAttr, ParamAttr]>;			def ReadNone : EnumAttr<"readnone", [FnAttr, ParamAttr]>;

	/// Function only reads from memory.			/// Function only reads from memory.
	def ReadOnly : EnumAttr<"readonly", [FnAttr, ParamAttr]>;			def ReadOnly : EnumAttr<"readonly", [FnAttr, ParamAttr]>;

	/// Return value is always equal to this argument.			/// Return value is always equal to this argument.
	def Returned : EnumAttr<"returned", [ParamAttr]>;			def Returned : EnumAttr<"returned", [ParamAttr]>;
	▲ Show 20 Lines • Show All 149 Lines • Show Last 20 Lines

llvm/include/llvm/IR/InstrTypes.h

Show First 20 Lines • Show All 1,841 Lines • ▼ Show 20 Lines public:

} }

/// Determine if the call requires strict floating point semantics. /// Determine if the call requires strict floating point semantics.

bool isStrictFP() const { return hasFnAttr(Attribute::StrictFP); } bool isStrictFP() const { return hasFnAttr(Attribute::StrictFP); }

/// Return true if the call should not be inlined. /// Return true if the call should not be inlined.

bool isNoInline() const { return hasFnAttr(Attribute::NoInline); } bool isNoInline() const { return hasFnAttr(Attribute::NoInline); }

void setIsNoInline() { addFnAttr(Attribute::NoInline); } void setIsNoInline() { addFnAttr(Attribute::NoInline); }

/// Determine if the call does not read thread ID.

bool doesNotReadThreadID() const {

return hasFnAttr(Attribute::NoReadThreadID);

}

void setDoesNotReadThreadID() { addFnAttr(Attribute::NoReadThreadID); }

rjmccallUnsubmitted

Done

The closest existing precedent would suggest doesNotReadThreadID and setDoesNotReadThreadID.

rjmccall: The closest existing precedent would suggest `doesNotReadThreadID` and `setDoesNotReadThreadID`.

/// Return true if the call does not read thread ID or the call doesn't live

rjmccallUnsubmitted

Done

void setNoReadThreadID() { addFnAttr(Attribute::NoReadThreadID); }

- /// Return true if the call does not read thread ID or the call doesn't lives

+ /// Return true if the call does not read thread ID or the call doesn't live

/// in a presplit coroutine function.

rjmccall:

/// in a presplit coroutine function.

///

/// This implies that the call never reads different thread ID in the parent

/// function. So that the optimizer could merge such calls if the calls does

/// not access or only reads memory.

bool cannotReadDifferentThreadIDIfMoved() const {

rjmccallUnsubmitted

Done

This is an odd use of "nor". Maybe take a different approach — canReadDifferentThreadIDIfMoved()?

rjmccall: This is an odd use of "nor". Maybe take a different approach —…

rjmccallUnsubmitted

Done

Oh, I didn't notice this last night — canReadDifferentThreadIDIfMoved has the opposite sense of the old method, so either you need to negate the logic in the method and all its call sites, or you need to rename it something like cannotReadDifferentThreadIDIfMoved.

rjmccall: Oh, I didn't notice this last night — `canReadDifferentThreadIDIfMoved` has the opposite sense…

ChuanqiXuAuthorUnsubmitted

Done

Oh, my bad. I should check that. Thanks for double checking!

ChuanqiXu: Oh, my bad. I should check that. Thanks for double checking!

return doesNotReadThreadID() || !getFunction() ||

!getFunction()->isPresplitCoroutine();

}

/// Determine if the call does not access memory. /// Determine if the call does not access memory.

bool doesNotAccessMemory() const { return hasFnAttr(Attribute::ReadNone); } bool doesNotAccessMemory() const { return hasFnAttr(Attribute::ReadNone); }

void setDoesNotAccessMemory() { addFnAttr(Attribute::ReadNone); } void setDoesNotAccessMemory() { addFnAttr(Attribute::ReadNone); }

/// Determine if the call does not access or only reads memory. /// Determine if the call does not access or only reads memory.

bool onlyReadsMemory() const { bool onlyReadsMemory() const {

return hasImpliedFnAttr(Attribute::ReadOnly); return hasImpliedFnAttr(Attribute::ReadOnly);

} }

▲ Show 20 Lines • Show All 581 Lines • Show Last 20 Lines

llvm/lib/Bitcode/Reader/BitcodeReader.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,999 Lines • ▼ Show 20 Lines	static Attribute::AttrKind getAttrFromCode(uint64_t Code) {
case bitc::ATTR_KIND_BYREF:		case bitc::ATTR_KIND_BYREF:
return Attribute::ByRef;		return Attribute::ByRef;
case bitc::ATTR_KIND_MUSTPROGRESS:		case bitc::ATTR_KIND_MUSTPROGRESS:
return Attribute::MustProgress;		return Attribute::MustProgress;
case bitc::ATTR_KIND_HOT:		case bitc::ATTR_KIND_HOT:
return Attribute::Hot;		return Attribute::Hot;
case bitc::ATTR_KIND_PRESPLIT_COROUTINE:		case bitc::ATTR_KIND_PRESPLIT_COROUTINE:
return Attribute::PresplitCoroutine;		return Attribute::PresplitCoroutine;
		case bitc::ATTR_KIND_NO_READ_THREAD_ID:
		return Attribute::NoReadThreadID;
}		}
}		}

Error BitcodeReader::parseAlignmentValue(uint64_t Exponent,		Error BitcodeReader::parseAlignmentValue(uint64_t Exponent,
MaybeAlign &Alignment) {		MaybeAlign &Alignment) {
// Note: Alignment in bitcode files is incremented by 1, so that zero		// Note: Alignment in bitcode files is incremented by 1, so that zero
// can be used for default alignment.		// can be used for default alignment.
if (Exponent > Value::MaxAlignmentExponent + 1)		if (Exponent > Value::MaxAlignmentExponent + 1)
▲ Show 20 Lines • Show All 5,951 Lines • Show Last 20 Lines

llvm/lib/Bitcode/Writer/BitcodeWriter.cpp

Show First 20 Lines • Show All 776 Lines • ▼ Show 20 Lines	static uint64_t getAttrKindEncoding(Attribute::AttrKind Kind) {
case Attribute::NoUndef:		case Attribute::NoUndef:
return bitc::ATTR_KIND_NOUNDEF;		return bitc::ATTR_KIND_NOUNDEF;
case Attribute::ByRef:		case Attribute::ByRef:
return bitc::ATTR_KIND_BYREF;		return bitc::ATTR_KIND_BYREF;
case Attribute::MustProgress:		case Attribute::MustProgress:
return bitc::ATTR_KIND_MUSTPROGRESS;		return bitc::ATTR_KIND_MUSTPROGRESS;
case Attribute::PresplitCoroutine:		case Attribute::PresplitCoroutine:
return bitc::ATTR_KIND_PRESPLIT_COROUTINE;		return bitc::ATTR_KIND_PRESPLIT_COROUTINE;
		case Attribute::NoReadThreadID:
		return bitc::ATTR_KIND_NO_READ_THREAD_ID;
case Attribute::EndAttrKinds:		case Attribute::EndAttrKinds:
llvm_unreachable("Can not encode end-attribute kinds marker.");		llvm_unreachable("Can not encode end-attribute kinds marker.");
case Attribute::None:		case Attribute::None:
llvm_unreachable("Can not encode none-attribute.");		llvm_unreachable("Can not encode none-attribute.");
case Attribute::EmptyKey:		case Attribute::EmptyKey:
case Attribute::TombstoneKey:		case Attribute::TombstoneKey:
llvm_unreachable("Trying to encode EmptyKey/TombstoneKey");		llvm_unreachable("Trying to encode EmptyKey/TombstoneKey");
}		}
▲ Show 20 Lines • Show All 4,259 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/EarlyCSE.cpp

Show First 20 Lines • Show All 126 Lines • ▼ Show 20 Lines	if (CallInst *CI = dyn_cast<CallInst>(Inst)) {
// the rounding mode to change.		// the rounding mode to change.
if (CFP->getRoundingMode() &&		if (CFP->getRoundingMode() &&
CFP->getRoundingMode() == RoundingMode::Dynamic)		CFP->getRoundingMode() == RoundingMode::Dynamic)
return false;		return false;
return true;		return true;
}		}
}		}
}		}
return CI->doesNotAccessMemory() && !CI->getType()->isVoidTy();		return CI->doesNotAccessMemory() &&
		CI->cannotReadDifferentThreadIDIfMoved() &&
		!CI->getType()->isVoidTy();
}		}
return isa<CastInst>(Inst) \|\| isa<UnaryOperator>(Inst) \|\|		return isa<CastInst>(Inst) \|\| isa<UnaryOperator>(Inst) \|\|
isa<BinaryOperator>(Inst) \|\| isa<GetElementPtrInst>(Inst) \|\|		isa<BinaryOperator>(Inst) \|\| isa<GetElementPtrInst>(Inst) \|\|
isa<CmpInst>(Inst) \|\| isa<SelectInst>(Inst) \|\|		isa<CmpInst>(Inst) \|\| isa<SelectInst>(Inst) \|\|
isa<ExtractElementInst>(Inst) \|\| isa<InsertElementInst>(Inst) \|\|		isa<ExtractElementInst>(Inst) \|\| isa<InsertElementInst>(Inst) \|\|
isa<ShuffleVectorInst>(Inst) \|\| isa<ExtractValueInst>(Inst) \|\|		isa<ShuffleVectorInst>(Inst) \|\| isa<ExtractValueInst>(Inst) \|\|
isa<InsertValueInst>(Inst) \|\| isa<FreezeInst>(Inst);		isa<InsertValueInst>(Inst) \|\| isa<FreezeInst>(Inst);
}		}
▲ Show 20 Lines • Show All 314 Lines • ▼ Show 20 Lines	struct CallValue {
}		}

static bool canHandle(Instruction *Inst) {		static bool canHandle(Instruction *Inst) {
// Don't value number anything that returns void.		// Don't value number anything that returns void.
if (Inst->getType()->isVoidTy())		if (Inst->getType()->isVoidTy())
return false;		return false;

CallInst *CI = dyn_cast<CallInst>(Inst);		CallInst *CI = dyn_cast<CallInst>(Inst);
if (!CI \|\| !CI->onlyReadsMemory())		if (!CI \|\| !CI->onlyReadsMemory() \|\|
		!CI->cannotReadDifferentThreadIDIfMoved())
return false;		return false;
return true;		return true;
}		}
};		};

} // end anonymous namespace		} // end anonymous namespace

namespace llvm {		namespace llvm {
▲ Show 20 Lines • Show All 1,316 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/GVN.cpp

Show First 20 Lines • Show All 444 Lines • ▼ Show 20 Lines
/// add - Insert a value into the table with a specified value number.		/// add - Insert a value into the table with a specified value number.
void GVNPass::ValueTable::add(Value *V, uint32_t num) {		void GVNPass::ValueTable::add(Value *V, uint32_t num) {
valueNumbering.insert(std::make_pair(V, num));		valueNumbering.insert(std::make_pair(V, num));
if (PHINode *PN = dyn_cast<PHINode>(V))		if (PHINode *PN = dyn_cast<PHINode>(V))
NumberingPhi[num] = PN;		NumberingPhi[num] = PN;
}		}

uint32_t GVNPass::ValueTable::lookupOrAddCall(CallInst *C) {		uint32_t GVNPass::ValueTable::lookupOrAddCall(CallInst *C) {
if (AA->doesNotAccessMemory(C)) {		if (AA->doesNotAccessMemory(C) &&
		C->cannotReadDifferentThreadIDIfMoved()) {
Expression exp = createExpr(C);		Expression exp = createExpr(C);
uint32_t e = assignExpNewValueNum(exp).first;		uint32_t e = assignExpNewValueNum(exp).first;
valueNumbering[C] = e;		valueNumbering[C] = e;
return e;		return e;
} else if (MD && AA->onlyReadsMemory(C)) {		}

		if (MD && AA->onlyReadsMemory(C) &&
		C->cannotReadDifferentThreadIDIfMoved()) {
Expression exp = createExpr(C);		Expression exp = createExpr(C);
auto ValNum = assignExpNewValueNum(exp);		auto ValNum = assignExpNewValueNum(exp);
if (ValNum.second) {		if (ValNum.second) {
valueNumbering[C] = ValNum.first;		valueNumbering[C] = ValNum.first;
return ValNum.first;		return ValNum.first;
}		}

MemDepResult local_dep = MD->getDependency(C);		MemDepResult local_dep = MD->getDependency(C);
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	for (unsigned i = 0, e = C->arg_size(); i < e; ++i) {
valueNumbering[C] = nextValueNumber;		valueNumbering[C] = nextValueNumber;
return nextValueNumber++;		return nextValueNumber++;
}		}
}		}

uint32_t v = lookupOrAdd(cdep);		uint32_t v = lookupOrAdd(cdep);
valueNumbering[C] = v;		valueNumbering[C] = v;
return v;		return v;
} else {		}

valueNumbering[C] = nextValueNumber;		valueNumbering[C] = nextValueNumber;
return nextValueNumber++;		return nextValueNumber++;
}		}
}

/// Returns true if a value number exists for the specified value.		/// Returns true if a value number exists for the specified value.
bool GVNPass::ValueTable::exists(Value *V) const {		bool GVNPass::ValueTable::exists(Value *V) const {
return valueNumbering.count(V) != 0;		return valueNumbering.count(V) != 0;
}		}

/// lookup_or_add - Returns the value number for the specified value, assigning		/// lookup_or_add - Returns the value number for the specified value, assigning
/// it a new number if it did not have one before.		/// it a new number if it did not have one before.
▲ Show 20 Lines • Show All 2,675 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/NewGVN.cpp

Show First 20 Lines • Show All 1,604 Lines • ▼ Show 20 Lines	if (auto *II = dyn_cast<IntrinsicInst>(I)) {
// Intrinsics with the returned attribute are copies of arguments.		// Intrinsics with the returned attribute are copies of arguments.
if (auto *ReturnedValue = II->getReturnedArgOperand()) {		if (auto *ReturnedValue = II->getReturnedArgOperand()) {
if (II->getIntrinsicID() == Intrinsic::ssa_copy)		if (II->getIntrinsicID() == Intrinsic::ssa_copy)
if (auto Res = performSymbolicPredicateInfoEvaluation(II))		if (auto Res = performSymbolicPredicateInfoEvaluation(II))
return Res;		return Res;
return ExprResult::some(createVariableOrConstant(ReturnedValue));		return ExprResult::some(createVariableOrConstant(ReturnedValue));
}		}
}		}
if (AA->doesNotAccessMemory(CI)) {
		if (!CI->cannotReadDifferentThreadIDIfMoved())
		return ExprResult::none();

		if (AA->doesNotAccessMemory(CI))
return ExprResult::some(		return ExprResult::some(
createCallExpression(CI, TOPClass->getMemoryLeader()));		createCallExpression(CI, TOPClass->getMemoryLeader()));
} else if (AA->onlyReadsMemory(CI)) {
		if (AA->onlyReadsMemory(CI)) {
if (auto *MA = MSSA->getMemoryAccess(CI)) {		if (auto *MA = MSSA->getMemoryAccess(CI)) {
auto *DefiningAccess = MSSAWalker->getClobberingMemoryAccess(MA);		auto *DefiningAccess = MSSAWalker->getClobberingMemoryAccess(MA);
return ExprResult::some(createCallExpression(CI, DefiningAccess));		return ExprResult::some(createCallExpression(CI, DefiningAccess));
} else // MSSA determined that CI does not access memory.		}

		// MSSA determined that CI does not access memory.
return ExprResult::some(		return ExprResult::some(
createCallExpression(CI, TOPClass->getMemoryLeader()));		createCallExpression(CI, TOPClass->getMemoryLeader()));
}		}

return ExprResult::none();		return ExprResult::none();
}		}

// Retrieve the memory class for a given MemoryAccess.		// Retrieve the memory class for a given MemoryAccess.
CongruenceClass NewGVN::getMemoryClass(const MemoryAccess MA) const {		CongruenceClass NewGVN::getMemoryClass(const MemoryAccess MA) const {
auto *Result = MemoryAccessToClass.lookup(MA);		auto *Result = MemoryAccessToClass.lookup(MA);
assert(Result && "Should have found memory class");		assert(Result && "Should have found memory class");
return Result;		return Result;
▲ Show 20 Lines • Show All 2,655 Lines • Show Last 20 Lines

llvm/lib/Transforms/Utils/CodeExtractor.cpp

Show First 20 Lines • Show All 916 Lines • ▼ Show 20 Lines	if (Attr.isStringAttribute()) {
case Attribute::ReadOnly:		case Attribute::ReadOnly:
case Attribute::ReturnsTwice:		case Attribute::ReturnsTwice:
case Attribute::Speculatable:		case Attribute::Speculatable:
case Attribute::StackAlignment:		case Attribute::StackAlignment:
case Attribute::WillReturn:		case Attribute::WillReturn:
case Attribute::WriteOnly:		case Attribute::WriteOnly:
case Attribute::AllocKind:		case Attribute::AllocKind:
case Attribute::PresplitCoroutine:		case Attribute::PresplitCoroutine:
		case Attribute::NoReadThreadID:
continue;		continue;
// Those attributes should be safe to propagate to the extracted function.		// Those attributes should be safe to propagate to the extracted function.
case Attribute::AlwaysInline:		case Attribute::AlwaysInline:
case Attribute::Cold:		case Attribute::Cold:
case Attribute::DisableSanitizerInstrumentation:		case Attribute::DisableSanitizerInstrumentation:
case Attribute::FnRetThunkExtern:		case Attribute::FnRetThunkExtern:
case Attribute::Hot:		case Attribute::Hot:
case Attribute::NoRecurse:		case Attribute::NoRecurse:
▲ Show 20 Lines • Show All 953 Lines • Show Last 20 Lines

llvm/test/Transforms/Coroutines/coro-readnone-01.ll

This file was added.

				; Tests that the readnone function which cross suspend points wouldn't be misoptimized.
				; RUN: opt < %s -S -passes='default<O3>' \| FileCheck %s --check-prefixes=CHECK,CHECK_SPLITTED
				; RUN: opt < %s -S -passes='early-cse' \| FileCheck %s --check-prefixes=CHECK,CHECK_UNSPLITTED
				; RUN: opt < %s -S -passes='gvn' \| FileCheck %s --check-prefixes=CHECK,CHECK_UNSPLITTED
				; RUN: opt < %s -S -passes='newgvn' \| FileCheck %s --check-prefixes=CHECK,CHECK_UNSPLITTED

				define ptr @f() presplitcoroutine {
				entry:
				%id = call token @llvm.coro.id(i32 0, ptr null, ptr null, ptr null)
				%size = call i32 @llvm.coro.size.i32()
				%alloc = call ptr @malloc(i32 %size)
				%hdl = call ptr @llvm.coro.begin(token %id, ptr %alloc)
				%j = call i32 @readnone_func() readnone
				%sus_result = call i8 @llvm.coro.suspend(token none, i1 false)
				switch i8 %sus_result, label %suspend [i8 0, label %resume
				i8 1, label %cleanup]
				resume:
				%i = call i32 @readnone_func() readnone
				%cmp = icmp eq i32 %i, %j
				br i1 %cmp, label %same, label %diff

				same:
				call void @print_same()
				br label %cleanup

				diff:
				call void @print_diff()
				br label %cleanup

				cleanup:
				%mem = call ptr @llvm.coro.free(token %id, ptr %hdl)
				call void @free(ptr %mem)
				br label %suspend

				suspend:
				call i1 @llvm.coro.end(ptr %hdl, i1 0)
				ret ptr %hdl
				}

				; Tests that normal functions wouldn't be affected.
				define i1 @normal_function() {
				entry:
				%i = call i32 @readnone_func() readnone
				%j = call i32 @readnone_func() readnone
				%cmp = icmp eq i32 %i, %j
				br i1 %cmp, label %same, label %diff

				same:
				call void @print_same()
				ret i1 true

				diff:
				call void @print_diff()
				ret i1 false
				}

				; CHECK_SPLITTED-LABEL: normal_function(
				; CHECK_SPLITTED-NEXT: entry
				; CHECK_SPLITTED-NEXT: call i32 @readnone_func()
				; CHECK_SPLITTED-NEXT: call void @print_same()
				; CHECK_SPLITTED-NEXT: ret i1 true
				;
				; CHECK_SPLITTED-LABEL: f.resume(
				; CHECK_UNSPLITTED-LABEL: @f(
				; CHECK: br i1 %cmp, label %same, label %diff
				; CHECK-EMPTY:
				; CHECK-NEXT: same:
				; CHECK-NEXT: call void @print_same()
				; CHECK-NEXT: br label
				; CHECK-EMPTY:
				; CHECK-NEXT: diff:
				; CHECK-NEXT: call void @print_diff()
				; CHECK-NEXT: br label

				declare i32 @readnone_func() readnone

				declare void @print_same()
				declare void @print_diff()
				declare ptr @llvm.coro.free(token, ptr)
				declare i32 @llvm.coro.size.i32()
				declare i8 @llvm.coro.suspend(token, i1)

				declare token @llvm.coro.id(i32, ptr, ptr, ptr)
				declare i1 @llvm.coro.alloc(token)
				declare ptr @llvm.coro.begin(token, ptr)
				declare i1 @llvm.coro.end(ptr, i1)

				declare noalias ptr @malloc(i32)
				declare void @free(ptr)

llvm/test/Transforms/Coroutines/coro-readnone-02.ll

This file was added.

				; Tests that the readnone function which don't cross suspend points could be optimized expectly after split.
				;
				; RUN: opt < %s -S -passes='default<O3>' \| FileCheck %s --check-prefixes=CHECK_SPLITTED
				; RUN: opt < %s -S -passes='coro-split,early-cse,simplifycfg' \| FileCheck %s --check-prefixes=CHECK_SPLITTED
				; RUN: opt < %s -S -passes='coro-split,gvn,simplifycfg' \| FileCheck %s --check-prefixes=CHECK_SPLITTED
				; RUN: opt < %s -S -passes='coro-split,newgvn,simplifycfg' \| FileCheck %s --check-prefixes=CHECK_SPLITTED
				; RUN: opt < %s -S -passes='early-cse' \| FileCheck %s --check-prefixes=CHECK_UNSPLITTED
				; RUN: opt < %s -S -passes='gvn' \| FileCheck %s --check-prefixes=CHECK_UNSPLITTED
				; RUN: opt < %s -S -passes='newgvn' \| FileCheck %s --check-prefixes=CHECK_UNSPLITTED

				define ptr @f() presplitcoroutine {
				entry:
				%id = call token @llvm.coro.id(i32 0, ptr null, ptr null, ptr null)
				%size = call i32 @llvm.coro.size.i32()
				%alloc = call ptr @malloc(i32 %size)
				%hdl = call ptr @llvm.coro.begin(token %id, ptr %alloc)
				%sus_result = call i8 @llvm.coro.suspend(token none, i1 false)
				switch i8 %sus_result, label %suspend [i8 0, label %resume
				i8 1, label %cleanup]
				resume:
				%i = call i32 @readnone_func() readnone
				; noop call to break optimization to combine two consecutive readonly calls.
				call void @nop()
				%j = call i32 @readnone_func() readnone
				%cmp = icmp eq i32 %i, %j
				br i1 %cmp, label %same, label %diff

				same:
				call void @print_same()
				br label %cleanup

				diff:
				call void @print_diff()
				br label %cleanup

				cleanup:
				%mem = call ptr @llvm.coro.free(token %id, ptr %hdl)
				call void @free(ptr %mem)
				br label %suspend

				suspend:
				call i1 @llvm.coro.end(ptr %hdl, i1 0)
				ret ptr %hdl
				}

				;
				; CHECK_SPLITTED-LABEL: f.resume(
				; CHECK_SPLITTED-NEXT: :
				; CHECK_SPLITTED-NEXT: call i32 @readnone_func() #[[ATTR_NUM:[0-9]+]]
				; CHECK_SPLITTED-NEXT: call void @nop()
				; CHECK_SPLITTED-NEXT: call void @print_same()
				;
				; CHECK_SPLITTED: attributes #[[ATTR_NUM]] = { readnone }
				;
				; CHECK_UNSPLITTED-LABEL: @f(
				; CHECK_UNSPLITTED: br i1 %cmp, label %same, label %diff
				; CHECK_UNSPLITTED-EMPTY:
				; CHECK_UNSPLITTED-NEXT: same:
				; CHECK_UNSPLITTED-NEXT: call void @print_same()
				; CHECK_UNSPLITTED-NEXT: br label
				; CHECK_UNSPLITTED-EMPTY:
				; CHECK_UNSPLITTED-NEXT: diff:
				; CHECK_UNSPLITTED-NEXT: call void @print_diff()
				; CHECK_UNSPLITTED-NEXT: br label

				declare i32 @readnone_func() readnone
				declare void @nop()

				declare void @print_same()
				declare void @print_diff()
				declare ptr @llvm.coro.free(token, ptr)
				declare i32 @llvm.coro.size.i32()
				declare i8 @llvm.coro.suspend(token, i1)

				declare token @llvm.coro.id(i32, ptr, ptr, ptr)
				declare i1 @llvm.coro.alloc(token)
				declare ptr @llvm.coro.begin(token, ptr)
				declare i1 @llvm.coro.end(ptr, i1)

				declare noalias ptr @malloc(i32)
				declare void @free(ptr)

llvm/test/Transforms/Coroutines/coro-readnone-03.ll

This file was added.

				; Tests that the readnone and noread_thread_id function which cross suspend points would be optimized expectedly.
				; RUN: opt < %s -S -passes='default<O3>' \| FileCheck %s --check-prefixes=CHECK-O3
				; RUN: opt < %s -S -passes='early-cse' \| FileCheck %s --check-prefixes=CHECK
				; RUN: opt < %s -S -passes='gvn' \| FileCheck %s --check-prefixes=CHECK
				; RUN: opt < %s -S -passes='newgvn' \| FileCheck %s --check-prefixes=CHECK

				define ptr @f() presplitcoroutine {
				entry:
				%id = call token @llvm.coro.id(i32 0, ptr null, ptr null, ptr null)
				%size = call i32 @llvm.coro.size.i32()
				%alloc = call ptr @malloc(i32 %size)
				%hdl = call ptr @llvm.coro.begin(token %id, ptr %alloc)
				%j = call i32 @readnone_func() readnone noread_thread_id
				%sus_result = call i8 @llvm.coro.suspend(token none, i1 false)
				switch i8 %sus_result, label %suspend [i8 0, label %resume
				i8 1, label %cleanup]
				resume:
				%i = call i32 @readnone_func() readnone noread_thread_id
				%cmp = icmp eq i32 %i, %j
				br i1 %cmp, label %same, label %diff

				same:
				call void @print_same()
				br label %cleanup

				diff:
				call void @print_diff()
				br label %cleanup

				cleanup:
				%mem = call ptr @llvm.coro.free(token %id, ptr %hdl)
				call void @free(ptr %mem)
				br label %suspend

				suspend:
				call i1 @llvm.coro.end(ptr %hdl, i1 0)
				ret ptr %hdl
				}

				; CHECK-O3: define{{.*}}@f.resume(
				; CHECK-O3-NEXT : CoroEnd:
				; CHECK-O3-NEXT : tail call void @print_same()
				; CHECK-O3-NEXT : tail call void @free(ptr nonnull %hdl)
				; CHECK-O3-NEXT : ret void

				; CHECK-LABEL: @f(
				; CHECK: resume:
				; CHECK: br i1 true, label %same, label %diff

				declare i32 @readnone_func() readnone noread_thread_id

				declare void @print_same()
				declare void @print_diff()
				declare ptr @llvm.coro.free(token, ptr)
				declare i32 @llvm.coro.size.i32()
				declare i8 @llvm.coro.suspend(token, i1)

				declare token @llvm.coro.id(i32, ptr, ptr, ptr)
				declare i1 @llvm.coro.alloc(token)
				declare ptr @llvm.coro.begin(token, ptr)
				declare i1 @llvm.coro.end(ptr, i1)

				declare noalias ptr @malloc(i32)
				declare void @free(ptr)

This is an archive of the discontinued LLVM Phabricator instance.

Introduce noread_thread_id to address the thread identification problem in coroutinesAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 455209

llvm/docs/LangRef.rst

llvm/include/llvm/Bitcode/LLVMBitCodes.h

llvm/include/llvm/IR/Attributes.td

llvm/include/llvm/IR/InstrTypes.h

llvm/lib/Bitcode/Reader/BitcodeReader.cpp

llvm/lib/Bitcode/Writer/BitcodeWriter.cpp

llvm/lib/Transforms/Scalar/EarlyCSE.cpp

llvm/lib/Transforms/Scalar/GVN.cpp

llvm/lib/Transforms/Scalar/NewGVN.cpp

llvm/lib/Transforms/Utils/CodeExtractor.cpp

llvm/test/Transforms/Coroutines/coro-readnone-01.ll

llvm/test/Transforms/Coroutines/coro-readnone-02.ll

llvm/test/Transforms/Coroutines/coro-readnone-03.ll

Introduce noread_thread_id to address the thread identification problem in coroutines
AbandonedPublic