This is an archive of the discontinued LLVM Phabricator instance.

[WIP] Build assume from call
ClosedPublic

Authored by Tyker on Jan 9 2020, 12:32 PM.

Details

Summary

this is part of the implementation of http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html

this patch gives the basis of building an assume to preserve all information from an instruction and add support for building an assume that preserve the information from a call.

Diff Detail

Event Timeline

Tyker created this revision.Jan 9 2020, 12:32 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 9 2020, 12:32 PM
Tyker updated this revision to Diff 237163.Jan 9 2020, 12:35 PM

Some high-level comments, haven't gone through it in detail

llvm/include/llvm/Transforms/Utils/AssumeBuilder.h
30 ↗(On Diff #237163)

We need to add documentation and probably change the variable/function names to match the convention.

llvm/lib/IR/Attributes.cpp
201

This can probably go in the other patch, right?

llvm/lib/Transforms/Utils/AssumeBuilder.cpp
18 ↗(On Diff #237163)

Why do we need the detail_assume_builder namespace at all?

24 ↗(On Diff #237163)

Since this is not always sound I am unsure what it would be used for.
Instead, you can look for assumes in the vicinity, probably always prior to the current position, but with consideration of willreturn and nounwind.

llvm/test/Transforms/Util/assume-builder.ll
56

Why is the address space explicit here?


I'm unsure if we need to preserve function (scope) attributes, at least until we get scoped assumes that actually scope the information, e.g. after inlining.

Tyker marked 5 inline comments as done.Jan 9 2020, 1:15 PM
Tyker added inline comments.
llvm/include/llvm/Transforms/Utils/AssumeBuilder.h
30 ↗(On Diff #237163)

yeah. i will fix that.

llvm/lib/IR/Attributes.cpp
201

getNameFromAttrKind is used in the to build the assume. but it could be moved in the previous patch.

llvm/lib/Transforms/Utils/AssumeBuilder.cpp
18 ↗(On Diff #237163)

it can be probably be removed. but i consider builderState a "common" name and the builderState shouldn't be used outside of AssumeBuilder.*

24 ↗(On Diff #237163)

it is just intended to be used for testing of the range version of the function.
the pass doesn't insert instruction it just output what it would look like if it did.

i made it look the same so i can use python scripts to update tests. but instruction are not inserted.

llvm/test/Transforms/Util/assume-builder.ll
56

Why is the address space explicit here?

i think it is because the llvm.assume are not inserted but printed as if they where. so the printing option may be different.

I'm unsure if we need to preserve function (scope) attributes, at least until we get scoped assumes that actually scope the information, e.g. after inlining.

by function (scope) attributes do you mean attributes on the argument of the declaration of a function ?
if so i think they can give us information about the call site. but if we can assume they are always propagated to the callsite we can skip them.

Tyker updated this revision to Diff 237181.Jan 9 2020, 1:19 PM
jdoerfert added inline comments.Jan 9 2020, 1:48 PM
llvm/lib/Transforms/Utils/AssumeBuilder.cpp
18 ↗(On Diff #237163)

Then move it into the cpp file and only declare functions that deal with its members in the header.

24 ↗(On Diff #237163)

Oh, why do we need a printer if we could actually do the tranformation? We can use unit tests to test this.

llvm/test/Transforms/Util/assume-builder.ll
56

Yes the function declaration attributes, e.g., nounwind.

The use case I have in mind would be to create an llvm.assume if you are about to move/delete/modify the call.
Having information about the arguments helps then, having information about the function does not, except if we would have a scope in which we know the information is good. I mean, if you inline a call site with an attribute XYZ but these attributes are all "scoped". Knowing XYZ holds at a single location is not helpful, at least I can't think of many function attributes for which this would be helpful in a non-scoped case.

Tyker updated this revision to Diff 237213.Jan 9 2020, 2:56 PM
Tyker marked 3 inline comments as done.

i changed the pass and the test to actually perform the transformation. it is less confusing this way.

llvm/test/Transforms/Util/assume-builder.ll
56

i think most function attributes aren't useful. but some are for example cold if we hit a call to a function marked cold, this is useful.
and we can't differentiate between useful and useless ones.

Tyker marked 2 inline comments as done.Jan 9 2020, 2:57 PM
jdoerfert added inline comments.Jan 9 2020, 3:24 PM
llvm/include/llvm/Transforms/Utils/AssumeBuilder.h
24 ↗(On Diff #237213)

Mention that the instruction is not inserted into a basic block.

31 ↗(On Diff #237213)

all but the run method don't seem to be needed (if this is a struct).

llvm/lib/Transforms/Utils/AssumeBuilder.cpp
25 ↗(On Diff #237213)

I usually favor structs with named members over tuples. It makes especially the uses way more readable, e.g., it says Elem.Name not Elem.get<0>().

29 ↗(On Diff #237213)

No need for Builder, this is a member. Same elsewhere.

44 ↗(On Diff #237213)

getCalledFunction might be null.

45 ↗(On Diff #237213)

don't use these constants but AttributeList::ReturnIndex, AttributeList::FunctionIndex, AttributeList::FirstArgIndex.

55 ↗(On Diff #237213)

Early exist are preferred:

if (XXX.empty())
  return nullptr;
/// other code
59 ↗(On Diff #237213)

I'd prefer the type instead of auto, which is simple if we make it a struct, but either way it should be const XXXX & whenever possible.

77 ↗(On Diff #237213)

Formatting: The methods should start with a lower case letter.

Maybe add a TODO mentioning that we should look in the vicinity for other assumes we can merge the information with.

85 ↗(On Diff #237213)

I'd just get the module from the instruction. I->getModule()

llvm/test/Transforms/Util/assume-builder.ll
56

Sure, let's keep them for now.


We need a test in which a property holds for multiple arguments.

fhahn added a subscriber: fhahn.Jan 10 2020, 2:09 AM
Tyker updated this revision to Diff 237347.Jan 10 2020, 8:55 AM
Tyker marked 14 inline comments as done.

fixed comments.

llvm/lib/Transforms/Utils/AssumeBuilder.cpp
77 ↗(On Diff #237213)

Maybe add a TODO mentioning that we should look in the vicinity for other assumes we can merge the information with.

to do this the BuildAssumeFromInst will have to change and insert the assume. i don't know if it is preferable.

85 ↗(On Diff #237213)

the module can't be const because of Intrinsic::getDeclaration requires a non-const Module.

Only minor points are left. including:
Can we make string attributes and function scope attributes optional (under a command line flag)? Especially the uses created for the latter could easily confuse the call graph and similar passes. That said, we need to teach them to ignore llvm.assume uses (see below).

Next steps (in no particular order):

  • Teach passes sensitive to uses, especially instcombine, to ignore uses in an llvm.assume.
  • Teach passes, e.g., the Attributor, to use the information provided this way.
  • Emit llvm.assume when we move/delete call instructions.
  • Perform some experiments.
  • Revisit the list to get agreement.
llvm/lib/Transforms/Utils/AssumeBuilder.cpp
42 ↗(On Diff #237347)

Documentation for the class and members is missing.

Nit: Maybe KnowledgeSet or sth. similar is a better name, BundlSet implies these are "already" operand bundles.

70 ↗(On Diff #237347)

You can directly capture the result:
if (Function *Fn = Call->getCalledFunction())

83 ↗(On Diff #237347)

Should we reverse the order here? That way the first argument is always the "WasOn" one.

113 ↗(On Diff #237347)

Nit: one can use for (Instruction &I : instructions(F)) to save one loop level here

77 ↗(On Diff #237213)

to do this the BuildAssumeFromInst will have to change and insert the assume. i don't know if it is preferable.

Fair point. We can investigate that in the future. Maybe a method to locate an existing assume, one to combine the information, one to create the new one and replace the old. Or a dedicated pass that moves them around and combines them. I think less assumes is "generally speaking" better but we don't need this now.

Tyker updated this revision to Diff 237552.Jan 12 2020, 10:55 AM
Tyker marked 5 inline comments as done.

fixed comments.

Next steps (in no particular order):

  • Teach passes sensitive to uses, especially instcombine, to ignore uses in an llvm.assume.
  • Teach passes, e.g., the Attributor, to use the information provided this way.
  • Emit llvm.assume when we move/delete call instructions.
  • Perform some experiments.
  • Revisit the list to get agreement.

I agree with the tasks.
But i think that we also need to build an API to query those new assumes before using them. First to have "good names" instead of indexes in the operand bundle and also because we don't want users to directly depend on the representation. If at some point we want to group knowledge by attributes.
example: for

call void @func(i32* nonnull %P1, i32* nonnull %P)

build

call void @llvm.assume(i1 true) [ "nonnull"(i32* %P1, i32* %P)]

instead of

call void @llvm.assume(i1 true) [ "nonnull"(i32* %P1), "nonnull"(i32* %P)]

this change will be harder if passes access the operand bundle element directly.

But i think that we also need to build an API to query those new assumes before using them. First to have "good names" instead of indexes in the operand bundle and also because we don't want users to directly depend on the representation. If at some point we want to group knowledge by attributes.
example: for

call void @func(i32* nonnull %P1, i32* nonnull %P)

build

call void @llvm.assume(i1 true) [ "nonnull"(i32* %P1, i32* %P)]

instead of

call void @llvm.assume(i1 true) [ "nonnull"(i32* %P1), "nonnull"(i32* %P)]

this change will be harder if passes access the operand bundle element directly.

I agree that an API is useful, Since this is attribute-based it should be as well. Maybe something like

`bool getAttributeFromAssume(Value &Base, CallInst &AssumeCI, Attribute::Kind Kind, int &Value)`

and similar for string attributes. Alternatively we could return a map from WhatOn values to attributes.

I don't understand. What do you mean by "good names"? We have the operand bundle strings, right?
Agreed that an API should hide the actual encoding, thus the above grouping is something we should be able to do easily.

`bool getAttributeFromAssume(Value &Base, CallInst &AssumeCI, Attribute::Kind Kind, int &Value)`

and similar for string attributes. Alternatively we could return a map from WhatOn values to attributes.

something like that seem fine.

I don't understand. What do you mean by "good names"? We have the operand bundle strings, right?

I meant that in the operand bundle the first argument is the WasOn and the second is optionally the attributes value.
but users shouldn't have to use magic numbers to access those fields.

Agreed that an API should hide the actual encoding, thus the above grouping is something we should be able to do easily.

i don't know if the grouping is desirable it makes arguments in the bundle more confusing when there is the attribute has an optional argument.
but i wanted to have the opportunity to tweak the representation later.

is there any comment left on this patch ?

jdoerfert accepted this revision.Jan 13 2020, 8:37 PM
`bool getAttributeFromAssume(Value &Base, CallInst &AssumeCI, Attribute::Kind Kind, int &Value)`

and similar for string attributes. Alternatively we could return a map from WhatOn values to attributes.

something like that seem fine.

I don't understand. What do you mean by "good names"? We have the operand bundle strings, right?

I meant that in the operand bundle the first argument is the WasOn and the second is optionally the attributes value.
but users shouldn't have to use magic numbers to access those fields.

Oh, right, agreed!

Agreed that an API should hide the actual encoding, thus the above grouping is something we should be able to do easily.

i don't know if the grouping is desirable it makes arguments in the bundle more confusing when there is the attribute has an optional argument.
but i wanted to have the opportunity to tweak the representation later.

is there any comment left on this patch ?

I added some minor ones but I think this patch is otherwise fine. We should not land it right away though. If you write the API next, either of us can make use of it in the Attributor. Lastly we need to generate the calls in the first place and we can try this out on benchmarks.

llvm/include/llvm/Transforms/Utils/AssumeBuilder.h
20 ↗(On Diff #237552)

Nit: Maybe say we return an assume call instead of just "instruction"?

The sentence is broken I think, the end should be something like derived from \p I.

25 ↗(On Diff #237552)

Arguably, you only need to expose one version, e.g., the single instruction one (with const). In the impl. we can query the module.

llvm/lib/Transforms/Utils/AssumeBuilder.cpp
7 ↗(On Diff #237552)

Here and in the header we should add a short file comment describing what functionality is in here.

17 ↗(On Diff #237552)

Nit: We might not need all of these anymore.

This revision is now accepted and ready to land.Jan 13 2020, 8:37 PM
Tyker updated this revision to Diff 238139.EditedJan 14 2020, 4:41 PM
Tyker marked 6 inline comments as done.

i renamed the files to KnowledgeRetention.* to be more general. we can put other related utilities like the query API in the same files. i had a hard time finding a good name and i am not sure i succeeded what do you think ?

i also fixed an issue were the order of bundles was non-deterministic. to do so i had to add a sort, but this sort can be used later to find bundles by binary search in the API.

We should not land it right away though

is it because the creation of the clang-10 release branch is happening soon or because we need to benchmarks first ?

If you write the API next, either of us can make use of it in the Attributor.

we can make the API and the first user of the API come at the same time so we can test the API transitively ? or this would make the patch too big ? I haven't taken a detailed enough look at the Attributor but i need to get to know it anyway.

Lastly we need to generate the calls in the first place and we can try this out on benchmarks.

agreed

llvm/include/llvm/Transforms/Utils/AssumeBuilder.h
25 ↗(On Diff #237552)

the module can't be const because of Intrinsic::getDeclaration requires a non-const Module. so it can't be optained from a const Instruction.
the other solution is const_cast, but i don't like it so much.

i renamed the files to KnowledgeRetention.* to be more general. we can put other related utilities like the query API in the same files. i had a hard time finding a good name and i am not sure i succeeded what do you think ?

OK with me.

i also fixed an issue were the order of bundles was non-deterministic. to do so i had to add a sort, but this sort can be used later to find bundles by binary search in the API.

Sounds good.

We should not land it right away though

is it because the creation of the clang-10 release branch is happening soon or because we need to benchmarks first ?

Also because of that, and we do not have settled on this scheme in the RFC thread.

If you write the API next, either of us can make use of it in the Attributor.

we can make the API and the first user of the API come at the same time so we can test the API transitively ? or this would make the patch too big ? I haven't taken a detailed enough look at the Attributor but i need to get to know it anyway.

So we can have a user right away. Any location that looks for attributes right now is an opportunity to look at attributes retained in an llvm.assume. The final Attributor design will probably become a little involved because we want to cache
results, use the API to get all attribute mappings right away, ... A nice first use case could be isKnownNonZeroFromAssume in ValueTracking.cpp or the pass that does load/store alignment from assume.

Lastly we need to generate the calls in the first place and we can try this out on benchmarks.

agreed

Tyker updated this revision to Diff 238628.Jan 16 2020, 3:10 PM

minor improvements.

jdoerfert added inline comments.Jan 16 2020, 5:41 PM
llvm/lib/Transforms/Utils/KnowledgeRetention.cpp
69

I don't think we should use the name here. We can use the pointer value, if that makes sense, but not the name as it can be changed which would break the invariant.

Tyker updated this revision to Diff 238878.Jan 17 2020, 3:31 PM
Tyker marked an inline comment as done.

fixed.

llvm/lib/Transforms/Utils/KnowledgeRetention.cpp
69

I reordered element in the tuple. so that names comes last because this is what enabled the best algorithm for query.

Regarding ordering by pointer or not ordering at all. i think that this will make the output non-deterministic or relying on the allocator. if it is fine to for it to be non-deterministic. we could fix the printing to prevent spurious test failures. query function can deal with any order.

jdoerfert added inline comments.Jan 17 2020, 4:14 PM
llvm/lib/Transforms/Utils/KnowledgeRetention.cpp
69

I can see why we do not want the non-deterministic order so no sorting by pointer. Now the name thing is still an issue. We should not rely on names whatsoever, if changing a name influences the result of a query is not what we want because it leads to hard to debug interactions of things that are unrelated.

Tyker marked an inline comment as done.Jan 21 2020, 3:24 PM
Tyker added inline comments.
llvm/lib/Transforms/Utils/KnowledgeRetention.cpp
69

with the current query API the order by name doesn't affect the results. it is just used as a way to prevent non-determinism.

This revision was automatically updated to reflect the committed changes.
MaskRay added a subscriber: MaskRay.EditedFeb 2 2020, 9:31 AM

It'd be nice if the description of 8ebe001553d0c204cc41f3dbc595031828b1f140 could mention what is different from previous attempts... Two previous reverts did not say why this patch was reverted.

Also, you can strip Subscribers: and Tags: #llvm lines from the description. They are not useful. I am using this script:

arcfilter () {
        git log -1 --pretty=%B | awk '/Reviewers:|Subscribers:/{p=1} /Reviewed By:|Differential Revision:/{p=0} !p && !/^Summary:/' | git commit --amend -F -
}
jdoerfert added a comment.EditedFeb 2 2020, 10:18 AM

It'd be nice if the description of 8ebe001553d0c204cc41f3dbc595031828b1f140 could mention what is different from previous attempts... Two previous reverts did not say why this patch was reverted.

Edit: Now I see.

Btw. We should also remove the [WIP] tag.

Tyker added a comment.Feb 2 2020, 10:26 AM

It'd be nice if the description of 8ebe001553d0c204cc41f3dbc595031828b1f140 could mention what is different from previous attempts... Two previous reverts did not say why this patch was reverted.

Also, you can strip Subscribers: and Tags: #llvm lines from the description. They are not useful. I am using this script:

arcfilter () {
        git log -1 --pretty=%B | awk '/Reviewers:|Subscribers:/{p=1} /Reviewed By:|Differential Revision:/{p=0} !p && !/^Summary:/' | git commit --amend -F -
}

sorry for all the commit and revert. some build bot have a compile error under gcc that I can't reproduce locally. so commits and reverts are kind of trial and error. if there is a better way do deal with this kind of issue please let me know.
I will strip the auto-generated parts of the comments from phab. thanks for the script.

It'd be nice if the description of 8ebe001553d0c204cc41f3dbc595031828b1f140 could mention what is different from previous attempts... Two previous reverts did not say why this patch was reverted.

Also, you can strip Subscribers: and Tags: #llvm lines from the description. They are not useful. I am using this script:

arcfilter () {
        git log -1 --pretty=%B | awk '/Reviewers:|Subscribers:/{p=1} /Reviewed By:|Differential Revision:/{p=0} !p && !/^Summary:/' | git commit --amend -F -
}

sorry for all the commit and revert. some build bot have a compile error under gcc that I can't reproduce locally. so commits and reverts are kind of trial and error. if there is a better way do deal with this kind of issue please let me know.
I will strip the auto-generated parts of the comments from phab. thanks for the script.

I usually ask on IRC if anyone knows / can reproduce the error.

Tyker added a comment.Feb 2 2020, 10:34 AM

It'd be nice if the description of 8ebe001553d0c204cc41f3dbc595031828b1f140 could mention what is different from previous attempts... Two previous reverts did not say why this patch was reverted.

Also, you can strip Subscribers: and Tags: #llvm lines from the description. They are not useful. I am using this script:

arcfilter () {
        git log -1 --pretty=%B | awk '/Reviewers:|Subscribers:/{p=1} /Reviewed By:|Differential Revision:/{p=0} !p && !/^Summary:/' | git commit --amend -F -
}

sorry for all the commit and revert. some build bot have a compile error under gcc that I can't reproduce locally. so commits and reverts are kind of trial and error. if there is a better way do deal with this kind of issue please let me know.
I will strip the auto-generated parts of the comments from phab. thanks for the script.

I usually ask on IRC if anyone knows / can reproduce the error.

i found a version of gcc on which i have the same error as the build bot. i should be fine from here.

fhahn added a comment.Feb 2 2020, 11:40 AM

Is this approach documented somewhere in-tree? I think it would be good to document it, rather than just having a link to the mailing list.

llvm/include/llvm/Transforms/Utils/KnowledgeRetention.h
9

I think it would be good to describe what kinds of information are preserved.

llvm/lib/Transforms/Utils/KnowledgeRetention.cpp
25

It would probably be good to document the members here.

26

Is this only used with attributes? If so, wouldn't it be better to use an Attribute*?

34

What is the use case for retaining an attribute without argument? Would also be good to mention.

Is this approach documented somewhere in-tree? I think it would be good to document it, rather than just having a link to the mailing list.

Lang Ref needs to be updated now. Agreed. @Tyker do you want to do that or should I?

llvm/include/llvm/Transforms/Utils/KnowledgeRetention.h
9

That is always hard to keep in sync. I would prefer a generic file comment and alternatively more details on the BuildAssumeFromInst description below.

fhahn added inline comments.Feb 2 2020, 12:16 PM
llvm/include/llvm/Transforms/Utils/KnowledgeRetention.h
9

Sure, a reference to the langref or something like that is fine. But as it reads now it does not give any kind of indication of what this can/should be used for.

jdoerfert added inline comments.Feb 2 2020, 12:34 PM
llvm/include/llvm/Transforms/Utils/KnowledgeRetention.h
9

The lang ref change was on my list anyway. Basically as part of the first in-tree use, which is the alignment assumption. We'll prioritize the lang ref patch now.

In addition we could add that some information to BuildAssumeFromInst that it retains llvm::Attribute information for now.

Tyker marked an inline comment as done.Feb 3 2020, 4:29 AM

Is this approach documented somewhere in-tree? I think it would be good to document it, rather than just having a link to the mailing list.

Lang Ref needs to be updated now. Agreed. @Tyker do you want to do that or should I?

this is needed. if you want i can do it.

llvm/include/llvm/Transforms/Utils/KnowledgeRetention.h
9

In addition we could add that some information to BuildAssumeFromInst that it retains llvm::Attribute information for now.

this restriction should be lifted at some point.