This is an archive of the discontinued LLVM Phabricator instance.

Today I started reading through the whole Attributor, or at least its core parts, because I think otherwise I won't
have a good enough understanding to proceed. This is the start of a series of small patches that will mostly be NFC
improvements to docs with which I hope I'll have the opportunity to ask questions about different decisions in the Attributor.

Starting off, I tried multiple times to read IRPosition and in my humble opinion, it seems way too complicated.
Specifically the KindOrArgNo. This coupling of basically 2 kinds of info in one seems unnecessary. Plus, IRP currently
saves a pointer (Value *, the anchor value) and an int. For most users of LLVM, a pointer will take 8 bytes. Because of alignment,
we will use another 8 bytes anyway for KindOrArgNo, not 4. And I think it will be so much simpler to use the wasted 4 bytes
to decouple the values. That is, one int for the kind and one intfor the argument number.

If we all agree in that, of course I'm willing to do the related patches.

Harbormaster completed remote builds in B49212: Diff 250357.Mar 14 2020, 8:01 AM

In D76175#1922874, @baziotis wrote:

Today I started reading through the whole Attributor, or at least its core parts, because I think otherwise I won't
have a good enough understanding to proceed. This is the start of a series of small patches that will mostly be NFC
improvements to docs with which I hope I'll have the opportunity to ask questions about different decisions in the Attributor.

This seems fine, as well as the initiative. Few comments though

Starting off, I tried multiple times to read IRPosition and in my humble opinion, it seems way too complicated.
Specifically the KindOrArgNo. This coupling of basically 2 kinds of info in one seems unnecessary.

I kind of like how this is working, but if others agree you can change it.

llvm/include/llvm/Transforms/IPO/Attributor.h
38	don't think this is necessary.
278	I think this assert is also redundant, given that it is called again in `getAnchorValue()`
280	This makes sense. Now that I look at the other parts we seem to mix these two a lot. I think the better solution is to inline `getAnchorScope` here and replace all the call sites with `getAssociatedFunction` as it's clearer (at least to me).

I kind of like how this is working, but if others agree you can change it.

I appreciate it, it may be that I was not involved when it was written. It just seems
to be unclear how it's working. Consider this:

/// Create a position describing the argument \p Arg.
static const IRPosition argument(const Argument &Arg) {
  return IRPosition(const_cast<Argument &>(Arg), Kind(Arg.getArgNo()));
}

Arg.getArgNo() can be 1, in which case, the respective Kind is IRP_CALL_SITE_ARGUMENT. At least to me, it makes
no sense why we want that. Again, it might be that only I don't get it but I have a lot of "why this happens?" moments
when reading different parts of the Attributor that have to do with arguments.

llvm/include/llvm/Transforms/IPO/Attributor.h
278	Yes, although I don't understand why it's not at the start of this function.
280	Yes, I like that too.

In D76175#1922874, @baziotis wrote:

Today I started reading through the whole Attributor, or at least its core parts, because I think otherwise I won't
have a good enough understanding to proceed. This is the start of a series of small patches that will mostly be NFC
improvements to docs with which I hope I'll have the opportunity to ask questions about different decisions in the Attributor.

Great! It's about time we give it a overhaul, a lot of the documentation is still from the very beginning as well. Feel free
to continue issuing design suggestions.

Starting off, I tried multiple times to read IRPosition and in my humble opinion, it seems way too complicated.
Specifically the KindOrArgNo. This coupling of basically 2 kinds of info in one seems unnecessary. Plus, IRP currently
saves a pointer (Value *, the anchor value) and an int. For most users of LLVM, a pointer will take 8 bytes. Because of alignment,
we will use another 8 bytes anyway for KindOrArgNo, not 4. And I think it will be so much simpler to use the wasted 4 bytes
to decouple the values. That is, one int for the kind and one intfor the argument number.

If we all agree in that, of course I'm willing to do the related patches.

You can split the two apart but fit them into at most 64bit, maybe even 32.
FWIW, I was hoping to add a position (=Instruction) to this class (or a subclass) at some point so we can do flow sensitive queries.

I'm fine with this. @sstefan1 please accept once you are satisfied.

llvm/include/llvm/Transforms/IPO/Attributor.h
168	FWIW. The encoding allows to check for arguments by doing `value >= 0`. If that is true, the argument number is the value.
278	not redundant but should be at the start. If it's invalid we crash on the dyn_cast otherwise (or worse)

In D76175#1923087, @jdoerfert wrote:

In D76175#1922874, @baziotis wrote:

Today I started reading through the whole Attributor, or at least its core parts, because I think otherwise I won't
have a good enough understanding to proceed. This is the start of a series of small patches that will mostly be NFC
improvements to docs with which I hope I'll have the opportunity to ask questions about different decisions in the Attributor.

Great! It's about time we give it a overhaul, a lot of the documentation is still from the very beginning as well. Feel free
to continue issuing design suggestions.

Starting off, I tried multiple times to read IRPosition and in my humble opinion, it seems way too complicated.
Specifically the KindOrArgNo. This coupling of basically 2 kinds of info in one seems unnecessary. Plus, IRP currently
saves a pointer (Value *, the anchor value) and an int. For most users of LLVM, a pointer will take 8 bytes. Because of alignment,
we will use another 8 bytes anyway for KindOrArgNo, not 4. And I think it will be so much simpler to use the wasted 4 bytes
to decouple the values. That is, one int for the kind and one intfor the argument number.

If we all agree in that, of course I'm willing to do the related patches.

You can split the two apart but fit them into at most 64bit, maybe even 32.

Um, right now 64 bit is only the pointer. How would one fit a pointer and 2 ints in 64 (or 32 bits).
Right now, if I understand correctly, the pointer and the int take about 12 bytes (96 bits). And probably 16 bytes because of the alignment.

FWIW, I was hoping to add a position (=Instruction) to this class (or a subclass) at some point so we can do flow sensitive queries.

Ok, good. I don't see right now how flow-sensitivity helps in IRP but I'll leave it for when time comes to introduce it.

I'm fine with this. @sstefan1 please accept once you are satisfied.

llvm/include/llvm/Transforms/IPO/Attributor.h
168	But how do you know if it's a call site argument or a formal parameter (i.e. `IRP_ARGUMENT`) ? Since argument numbers start from 0, this is problematic (e.g. see the previous comment regarding the `argument()` constructor). Plus then there are a lot of `switch`es that if say we have argument number == 3 don't seem to be that clear. I'm mainly asking in order to be able to understand the current code so that I can change it.
278	Yes, ok, that's why I mentioned that its position seemed weird. I'll change it.

sizeof(Kind + ArgNo) <= 64

jdoerfert added inline comments.Mar 14 2020, 6:56 PM

llvm/include/llvm/Transforms/IPO/Attributor.h
168	If the value is smaller than 0, it is of the type specified by the value, thus the value is the "kind". If the value is at least 0 and the associated value is an llvm::Argument, it is an argument at position value. If the value is at least 0 and the associated value is an llvm::CallBase, it is a call site argument at position value.
280	These are different things for call site [arguments]

sstefan1 accepted this revision.Mar 15 2020, 3:23 AM

sstefan1 added inline comments.

llvm/include/llvm/Transforms/IPO/Attributor.h
280	I agree, my bad. I think that there are at least few places that we use `getAssociatedFunction` instead of `getAnchorScope`. Not necessary in this patch.

This revision is now accepted and ready to land.Mar 15 2020, 3:23 AM

sizeof(Kind + ArgNo) <= 64

Oh bytes, yes. With the change they will take 16 bytes (as now).

@sstefan1
Thanks for accepting, I'll upload another diff and I'll leave other bigger changes (like the kind thing or the getAnchorScope() - if it gets agreed upon - for followups).

llvm/include/llvm/Transforms/IPO/Attributor.h
168	Oh ok, using `dyn_cast` or sth on the `AnchorVal`. Thanks, I thought it was somehow done only with `KindOrArgNo` and I missed it.
280	So, yesterday I saw quite late this `getCalledFunction()` which should create problems and it did. I left it for today to try to combine them (in `getAssociatedFunction()`) better but it seems we can't. Anytime that we do CGSCC pass for example, for `call` instructions is different (or when we get scope for `AAValueRange`). It may actually be a good idea to decouple them further rather than combining them. Specifically: I think that there are at least few places that we use getAssociatedFunction instead of getAnchorScope to me this is a problem because `getAssociatedFunction()` does not have a coherent behavior. It's like "I'll give you the parent function most of the time, except if the anchor value is a CallBase". So, when you see it used in the code, it's unclear whether the one who wrote it thought: I know the anchor value is a CallBase, I expect the callee back. I know it's not, I want the parent function of the AnchorVal. I want a whatever scope that should be a function. I think: Case 1 calls should be left as they are. Case 2 calls should be replaced with getAnchorScope Case 3 calls should be eliminated because they're unclear. Plus, if that happens, we should change `getAssociatedFunction()` to only return the callee and otherwise null (i.e. remove the ambiguity). See follow-up diff.

Replacement of getAssociatedFunction() with getAnchorScope() where I think it's almost certain we want the parent function (we can revert that if you don't like it).
Other small refactorings.

omarahmed added a subscriber: omarahmed.Mar 15 2020, 11:34 AM

Committing?

I'll re-accept :)

llvm/lib/Transforms/IPO/Attributor.cpp
8652	typo: cache.

LGTM but it is not fixing the typo rather refactoring :)

Thanks to both!

@uenoku It fixes some typos but with the discussion, I also did some refactoring. I'll change the description.

baziotis retitled this revision from [Attributor][NFC] Typos in doc to [Attributor][NFC] Refactorings and typos in doc.Mar 23 2020, 10:21 AM

Closed by commit rGa650d555fc21: [Attributor][NFC] Refactorings and typos in doc (authored by baziotis). · Explain WhyMar 23 2020, 2:11 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

IPO/

Attributor.h

19 lines

lib/

Transforms/

IPO/

Attributor.cpp

2 lines

Diff 250357

llvm/include/llvm/Transforms/IPO/Attributor.h

Show All 23 Lines
// information to other abstract attributes in-flight but we might not want to		// information to other abstract attributes in-flight but we might not want to
// manifest the information. The Attributor allows to query in-flight abstract		// manifest the information. The Attributor allows to query in-flight abstract
// attributes through the `Attributor::getAAFor` method (see the method		// attributes through the `Attributor::getAAFor` method (see the method
// description for an example). If the method is used by an abstract attribute		// description for an example). If the method is used by an abstract attribute
// P, and it results in an abstract attribute Q, the Attributor will		// P, and it results in an abstract attribute Q, the Attributor will
// automatically capture a potential dependence from Q to P. This dependence		// automatically capture a potential dependence from Q to P. This dependence
// will cause P to be reevaluated whenever Q changes in the future.		// will cause P to be reevaluated whenever Q changes in the future.
//		//
// The Attributor will only reevaluated abstract attributes that might have		// The Attributor will only reevaluate abstract attributes that might have
// changed since the last iteration. That means that the Attribute will not		// changed since the last iteration. That means that the Attribute will not
// revisit all instructions/blocks/functions in the module but only query		// revisit all instructions/blocks/functions in the module but only query
// an update from a subset of the abstract attributes.		// an update from a subset of the abstract attributes.
//		//
// The update method `AbstractAttribute::updateImpl` is implemented by the		// The update method `AbstractAttribute::updateImpl` is implemented by the
// specific "abstract attribute" subclasses. The method is invoked whenever the		// specific "abstract attribute"(AA*) subclasses. The method is invoked whenever
		sstefan1Unsubmitted Not Done Reply Inline Actions don't think this is necessary. sstefan1: don't think this is necessary.
// currently assumed state (see the AbstractState class) might not be valid		// the currently assumed state (see the AbstractState class) might not be valid
// anymore. This can, for example, happen if the state was dependent on another		// anymore. This can, for example, happen if the state was dependent on another
// abstract attribute that changed. In every invocation, the update method has		// abstract attribute that changed. In every invocation, the update method has
// to adjust the internal state of an abstract attribute to a point that is		// to adjust the internal state of an abstract attribute to a point that is
// justifiable by the underlying IR and the current state of abstract attributes		// justifiable by the underlying IR and the current state of abstract attributes
// in-flight. Since the IR is given and assumed to be valid, the information		// in-flight. Since the IR is given and assumed to be valid, the information
// derived from it can be assumed to hold. However, information derived from		// derived from it can be assumed to hold. However, information derived from
// other abstract attributes is conditional on various things. If the justifying		// other abstract attributes is conditional on various things. If the justifying
// state changed, the `updateImpl` has to revisit the situation and potentially		// state changed, the `updateImpl` has to revisit the situation and potentially
▲ Show 20 Lines • Show All 99 Lines • ▼ Show 20 Lines
/// as well as a distinction between call sites and functions. Finally, there		/// as well as a distinction between call sites and functions. Finally, there
/// are floating values that do not have a corresponding attribute list		/// are floating values that do not have a corresponding attribute list
/// position.		/// position.
struct IRPosition {		struct IRPosition {
virtual ~IRPosition() {}		virtual ~IRPosition() {}

/// The positions we distinguish in the IR.		/// The positions we distinguish in the IR.
///		///
/// The values are chosen such that the KindOrArgNo member has a value >= 1		/// The values are chosen such that the KindOrArgNo member has a value >= 0
/// if it is an argument or call site argument while a value < 1 indicates the		/// if it is an argument or call site argument while a value < 0 indicates the
/// respective kind of that value.		/// respective kind of that value.
enum Kind : int {		enum Kind : int {
IRP_INVALID = -6, ///< An invalid position.		IRP_INVALID = -6, ///< An invalid position.
IRP_FLOAT = -5, ///< A position that is not associated with a spot suitable		IRP_FLOAT = -5, ///< A position that is not associated with a spot suitable
///< for attributes. This could be any value or instruction.		///< for attributes. This could be any value or instruction.
IRP_RETURNED = -4, ///< An attribute for the function return value.		IRP_RETURNED = -4, ///< An attribute for the function return value.
IRP_CALL_SITE_RETURNED = -3, ///< An attribute for a call site return value.		IRP_CALL_SITE_RETURNED = -3, ///< An attribute for a call site return value.
IRP_FUNCTION = -2, ///< An attribute for a function (scope).		IRP_FUNCTION = -2, ///< An attribute for a function (scope).
IRP_CALL_SITE = -1, ///< An attribute for a call site (function scope).		IRP_CALL_SITE = -1, ///< An attribute for a call site (function scope).
IRP_ARGUMENT = 0, ///< An attribute for a function argument.		IRP_ARGUMENT = 0, ///< An attribute for a function argument.
IRP_CALL_SITE_ARGUMENT = 1, ///< An attribute for a call site argument.		IRP_CALL_SITE_ARGUMENT = 1, ///< An attribute for a call site argument.
};		};
		jdoerfertUnsubmitted Not Done Reply Inline Actions FWIW. The encoding allows to check for arguments by doing `value >= 0`. If that is true, the argument number is the value. jdoerfert: FWIW. The encoding allows to check for arguments by doing `value >= 0`. If that is true, the…
		baziotisAuthorUnsubmitted Done Reply Inline Actions But how do you know if it's a call site argument or a formal parameter (i.e. `IRP_ARGUMENT`) ? Since argument numbers start from 0, this is problematic (e.g. see the previous comment regarding the `argument()` constructor). Plus then there are a lot of `switch`es that if say we have argument number == 3 don't seem to be that clear. I'm mainly asking in order to be able to understand the current code so that I can change it. baziotis: But how do you know if it's a call site argument or a formal parameter (i.e. `IRP_ARGUMENT`) ?
		jdoerfertUnsubmitted Not Done Reply Inline Actions If the value is smaller than 0, it is of the type specified by the value, thus the value is the "kind". If the value is at least 0 and the associated value is an llvm::Argument, it is an argument at position value. If the value is at least 0 and the associated value is an llvm::CallBase, it is a call site argument at position value. jdoerfert: If the value is smaller than 0, it is of the type specified by the value, thus the value is the…
		baziotisAuthorUnsubmitted Done Reply Inline Actions Oh ok, using `dyn_cast` or sth on the `AnchorVal`. Thanks, I thought it was somehow done only with `KindOrArgNo` and I missed it. baziotis: Oh ok, using `dyn_cast` or sth on the `AnchorVal`. Thanks, I thought it was somehow done only…

/// Default constructor available to create invalid positions implicitly. All		/// Default constructor available to create invalid positions implicitly. All
/// other positions need to be created explicitly through the appropriate		/// other positions need to be created explicitly through the appropriate
/// static member function.		/// static member function.
IRPosition() : AnchorVal(nullptr), KindOrArgNo(IRP_INVALID) { verify(); }		IRPosition() : AnchorVal(nullptr), KindOrArgNo(IRP_INVALID) { verify(); }

/// Create a position describing the value of \p V.		/// Create a position describing the value of \p V.
static const IRPosition value(const Value &V) {		static const IRPosition value(const Value &V) {
▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines	assert(KindOrArgNo != IRP_INVALID &&
"Invalid position does not have an anchor value!");		"Invalid position does not have an anchor value!");
return *AnchorVal;		return *AnchorVal;
}		}

/// Return the associated function, if any.		/// Return the associated function, if any.
Function *getAssociatedFunction() const {		Function *getAssociatedFunction() const {
if (auto *CB = dyn_cast<CallBase>(AnchorVal))		if (auto *CB = dyn_cast<CallBase>(AnchorVal))
return CB->getCalledFunction();		return CB->getCalledFunction();
assert(KindOrArgNo != IRP_INVALID &&		assert(KindOrArgNo != IRP_INVALID &&
		sstefan1Unsubmitted Not Done Reply Inline Actions I think this assert is also redundant, given that it is called again in `getAnchorValue()` sstefan1: I think this assert is also redundant, given that it is called again in `getAnchorValue()`
		baziotisAuthorUnsubmitted Done Reply Inline Actions Yes, although I don't understand why it's not at the start of this function. baziotis: Yes, although I don't understand why it's not at the start of this function.
		jdoerfertUnsubmitted Not Done Reply Inline Actions not redundant but should be at the start. If it's invalid we crash on the dyn_cast otherwise (or worse) jdoerfert: not redundant but should be at the start. If it's invalid we crash on the dyn_cast otherwise…
		baziotisAuthorUnsubmitted Done Reply Inline Actions Yes, ok, that's why I mentioned that its position seemed weird. I'll change it. baziotis: Yes, ok, that's why I mentioned that its position seemed weird. I'll change it.
"Invalid position does not have an anchor scope!");		"Invalid position does not have an anchor scope!");
Value &V = getAnchorValue();		return getAnchorScope();
		sstefan1Unsubmitted Not Done Reply Inline Actions This makes sense. Now that I look at the other parts we seem to mix these two a lot. I think the better solution is to inline `getAnchorScope` here and replace all the call sites with `getAssociatedFunction` as it's clearer (at least to me). sstefan1: This makes sense. Now that I look at the other parts we seem to mix these two a lot. I think…
		baziotisAuthorUnsubmitted Done Reply Inline Actions Yes, I like that too. baziotis: Yes, I like that too.
		jdoerfertUnsubmitted Not Done Reply Inline Actions These are different things for call site [arguments] jdoerfert: These are different things for call site [arguments]
		sstefan1Unsubmitted Not Done Reply Inline Actions I agree, my bad. I think that there are at least few places that we use `getAssociatedFunction` instead of `getAnchorScope`. Not necessary in this patch. sstefan1: I agree, my bad. I think that there are at least few places that we use `getAssociatedFunction`…
		baziotisAuthorUnsubmitted Done Reply Inline Actions So, yesterday I saw quite late this `getCalledFunction()` which should create problems and it did. I left it for today to try to combine them (in `getAssociatedFunction()`) better but it seems we can't. Anytime that we do CGSCC pass for example, for `call` instructions is different (or when we get scope for `AAValueRange`). It may actually be a good idea to decouple them further rather than combining them. Specifically: I think that there are at least few places that we use getAssociatedFunction instead of getAnchorScope to me this is a problem because `getAssociatedFunction()` does not have a coherent behavior. It's like "I'll give you the parent function most of the time, except if the anchor value is a CallBase". So, when you see it used in the code, it's unclear whether the one who wrote it thought: I know the anchor value is a CallBase, I expect the callee back. I know it's not, I want the parent function of the AnchorVal. I want a whatever scope that should be a function. I think: Case 1 calls should be left as they are. Case 2 calls should be replaced with getAnchorScope Case 3 calls should be eliminated because they're unclear. Plus, if that happens, we should change `getAssociatedFunction()` to only return the callee and otherwise null (i.e. remove the ambiguity). See follow-up diff. baziotis: So, yesterday I saw quite late this `getCalledFunction()` which should create problems and it…
if (isa<Function>(V))
return &cast<Function>(V);
if (isa<Argument>(V))
return cast<Argument>(V).getParent();
if (isa<Instruction>(V))
return cast<Instruction>(V).getFunction();
return nullptr;
}		}

/// Return the associated argument, if any.		/// Return the associated argument, if any.
Argument *getAssociatedArgument() const;		Argument *getAssociatedArgument() const;

/// Return true if the position refers to a function interface, that is the		/// Return true if the position refers to a function interface, that is the
/// function scope, the function return, or an argument.		/// function scope, the function return, or an argument.
bool isFnInterfaceKind() const {		bool isFnInterfaceKind() const {
▲ Show 20 Lines • Show All 2,499 Lines • Show Last 20 Lines

llvm/lib/Transforms/IPO/Attributor.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 8,312 Lines • ▼ Show 20 Lines	for (Instruction &I : instructions(&F)) {
// To allow easy access to all instructions in a function with a given		// To allow easy access to all instructions in a function with a given
// opcode we store them in the InfoCache. As not all opcodes are interesting		// opcode we store them in the InfoCache. As not all opcodes are interesting
// to concrete attributes we only cache the ones that are as identified in		// to concrete attributes we only cache the ones that are as identified in
// the following switch.		// the following switch.
// Note: There are no concrete attributes now so this is initially empty.		// Note: There are no concrete attributes now so this is initially empty.
switch (I.getOpcode()) {		switch (I.getOpcode()) {
default:		default:
assert((!ImmutableCallSite(&I)) && (!isa<CallBase>(&I)) &&		assert((!ImmutableCallSite(&I)) && (!isa<CallBase>(&I)) &&
"New call site/base instruction type needs to be known int the "		"New call site/base instruction type needs to be known in the "
"Attributor.");		"Attributor.");
break;		break;
case Instruction::Load:		case Instruction::Load:
// The alignment of a pointer is interesting for loads.		// The alignment of a pointer is interesting for loads.
case Instruction::Store:		case Instruction::Store:
// The alignment of a pointer is interesting for stores.		// The alignment of a pointer is interesting for stores.
case Instruction::Call:		case Instruction::Call:
case Instruction::CallBr:		case Instruction::CallBr:
▲ Show 20 Lines • Show All 314 Lines • ▼ Show 20 Lines	LLVM_DEBUG(dbgs() << "[Attributor] Run on module with " << Functions.size()
<< " functions.\n");		<< " functions.\n");

// Create an Attributor and initially empty information cache that is filled		// Create an Attributor and initially empty information cache that is filled
// while we identify default attribute opportunities.		// while we identify default attribute opportunities.
Attributor A(Functions, InfoCache, CGUpdater, DepRecInterval);		Attributor A(Functions, InfoCache, CGUpdater, DepRecInterval);

for (Function *F : Functions)		for (Function *F : Functions)
A.initializeInformationCache(*F);		A.initializeInformationCache(*F);

		sstefan1Unsubmitted Not Done Reply Inline Actions typo: cache. sstefan1: typo: cache.
for (Function *F : Functions) {		for (Function *F : Functions) {
if (F->hasExactDefinition())		if (F->hasExactDefinition())
NumFnWithExactDefinition++;		NumFnWithExactDefinition++;
else		else
NumFnWithoutExactDefinition++;		NumFnWithoutExactDefinition++;

// We look at internal functions only on-demand but if any use is not a		// We look at internal functions only on-demand but if any use is not a
// direct call or outside the current set of analyzed functions, we have to		// direct call or outside the current set of analyzed functions, we have to
▲ Show 20 Lines • Show All 305 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[Attributor][NFC] Refactorings and typos in docClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 250357

llvm/include/llvm/Transforms/IPO/Attributor.h

llvm/lib/Transforms/IPO/Attributor.cpp

[Attributor][NFC] Refactorings and typos in doc
ClosedPublic