This is an archive of the discontinued LLVM Phabricator instance.

[Attributor][MustExec] Deduce dereferenceable and nonnull attribute using MustBeExecutedContextExplorer
ClosedPublic

Authored by uenoku on Jul 29 2019, 9:43 AM.

Details

Summary

In D65186 and related patches, MustBeExecutedContextExplorer is introduced. This enables us to traverse instructions guaranteed to execute from function entry. If we can know the argument is used as dereferenceable or nonnull in these instructions, we can mark dereferenceable or nonnull in the argument definition:

  1. Memory instruction (similar to D64258)

Trace memory instruction pointer operand. Currently, only inbounds GEPs are traced.

define i64* @f(i64* %a) {
entry:
  %add.ptr = getelementptr inbounds i64, i64* %a, i64 1
; (because of inbounds GEP we can know that %a is at least dereferenceable(16))
  store i64 1, i64* %add.ptr, align 8
  ret i64* %add.ptr ; dereferenceable 8 (because above instruction stores into it)
}
  1. Propagation from callsite (similar to D27855)

If deref or nonnull are known in call site parameter attributes we can also say that argument also that attribute.

declare void @use3(i8* %x, i8* %y, i8* %z);
declare void @use3nonnull(i8* nonnull %x, i8* nonnull %y, i8* nonnull %z);

define void @parent1(i8* %a, i8* %b, i8* %c) {
  call void @use3nonnull(i8* %b, i8* %c, i8* %a) 
; Above instruction is always executed so we can say that@parent1(i8* nonnnull %a, i8* nonnull %b, i8* nonnull %c)
  call void @use3(i8* %c, i8* %a, i8* %b)
  ret void
}

Diff Detail

Event Timeline

uenoku created this revision.Jul 29 2019, 9:43 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 29 2019, 9:43 AM
xbolva00 added a subscriber: xbolva00.EditedJul 29 2019, 9:56 AM

What about

memcmp(a,b,16) ? Can we infer that a,b are deref(16)?

If so, D53342 should handle it (I need to update it)

uenoku added a comment.EditedJul 29 2019, 10:12 AM

What about

memcmp(a,b,16) ? Do we infer that a,b are nonnull and deref(16)?

Same for many libc functions..

It can't be inferred in current attributes set.

To infer this, I think we need to introduce a new attribute similar to allocsize something like:
dereferenceable_arg(argno): dereferenceable size depends on argno-th argument value
ex)

declare i32 @memcmp(i8* dereferenceable_arg(2), i8* dereferenceable_arg(2), i64)

If we add this attribute and annotate libc function, I think the deduction is not so hard.

xbolva00 added a comment.EditedJul 29 2019, 10:32 AM

Thanks for confirmation that it would be okay to add.

I think we do not need 'dereferenceable_arg' for C libcalls since we can handle only constant sizes (memcmp/memcpy/memmove(p1, p2, C) where C > 0) anyway (if inteested, you can follow older discussion in D53342 :)).

Edit: but as an general idea, dereferenceable_arg seems like a nice idea :)

I thought about this some more. I think you should move the explorer into the InformationCache and expose a query method like this:

void exploreMustExecuteContext(Instruction &I, SmallVectorImpl<Instruction *> &Worklist, std::function <bool(Instruction &I, bool /* MustBeExecutedWith */)> &QueryResultCallback);

The idea is:

  • The InformationCache will make sure that the explore iterator is "incremented" if needed, e.g., intiially just run it until there is no more instruction to be found.
  • The worklist will be populated by interesting instructions, most of the timer (transitive) users.
  • The callback informs of the result and allows to re-populate the worklist. If it returns false, the exploration is aborted.
  • The InformationCache will pop an instruction from the worklist, check if the explore iterator "contains" it, potentially increases the explored context, and return the result through the callback.

I think this will work well for most attributes we have now, what are your thoughts?

Also, do this exploration stuff for now only in the initialize method.

llvm/test/Transforms/FunctionAttrs/nosync.ll
100

volatile loads/stores do not imply dereferenceability.

jdoerfert added inline comments.Jul 29 2019, 11:19 PM
llvm/lib/Transforms/IPO/Attributor.cpp
3027

Similar to D64258, a follow up patch could keep track of all non-inbounds accessed bytes, build a vector that marks each of them starting at the base pointer, and then deriving dereferenceability as the number of consecutive accessed bytes.

The idea is:

  • The InformationCache will make sure that the explore iterator is "incremented" if needed, e.g., intiially just run it until there is no more instruction to be found.
  • The worklist will be populated by interesting instructions, most of the timer (transitive) users.
  • The callback informs of the result and allows to re-populate the worklist. If it returns false, the exploration is aborted.
  • The InformationCache will pop an instruction from the worklist, check if the explore iterator "contains" it, potentially increases the explored context, and return the result through the callback.

    I think this will work well for most attributes we have now, what are your thoughts?

I think this is reasonable.

jdoerfert added a comment.EditedJul 30 2019, 3:01 PM

The idea is:

  • The InformationCache will make sure that the explore iterator is "incremented" if needed, e.g., intiially just run it until there is no more instruction to be found.
  • The worklist will be populated by interesting instructions, most of the timer (transitive) users.
  • The callback informs of the result and allows to re-populate the worklist. If it returns false, the exploration is aborted.
  • The InformationCache will pop an instruction from the worklist, check if the explore iterator "contains" it, potentially increases the explored context, and return the result through the callback.

    I think this will work well for most attributes we have now, what are your thoughts?

I think this is reasonable.

Alternatively, we could do the following in the explorer (which I now find much nicer):

Given a predicate and an program point PP (=instruction), check if that predicate holds for sure if PP is executed. The nice part would be we move the logic there and we could handle "paths" later.
E.g., pred(A) and pred(B) are true, the method could say true for something like PP; if (...) A; else B;

If you want you could look into this, otherwise I will.

uenoku updated this revision to Diff 212743.Aug 1 2019, 12:25 AM
uenoku edited the summary of this revision. (Show Details)

Address comment.

  • Add checkPredicateAfterInstruction in Explorer.
  • Add tests in nonnull about path exploration.
jdoerfert accepted this revision.Aug 1 2019, 9:11 AM

One nit, and one problem that can be easily fixed and needs to be tested.
Otherwise, LGTM.

llvm/lib/Transforms/IPO/Attributor.cpp
466

make helpers static, also below. I somehow thought this function exists somewhere but I might be wrong.

3023

This is not correct, I think. We should add a test for sure.

The problem is that there could be bitcasts between base and I which cause the accessed bytes to be less than the base type would suggest.

So

i32* %A
load i8, i8* (bitcast i32* %A to i8*)

should cause deref(1) on %A but not deref(4)

This revision is now accepted and ready to land.Aug 1 2019, 9:11 AM

Alternatively, we could do the following in the explorer (which I now find much nicer):

Given a predicate and an program point PP (=instruction), check if that predicate holds for sure if PP is executed. The nice part would be we move the logic there and we could handle "paths" later.
E.g., pred(A) and pred(B) are true, the method could say true for something like PP; if (...) A; else B;

Regarding this, it seems very interesting and I uploaded a prototype for deduction using paths(D65593).

uenoku updated this revision to Diff 213785.Aug 6 2019, 8:13 PM

Rebase and address comment. A Test is added to dereferenceable.ll.

Hi,

So with this patch and enabled Attributor I expect that arguments will be annotated with dereferenceable(16) (propagated from callsite), right?

My IR:

define dso_local void @alias(i8* nocapture %s, i8* nocapture %p) local_unnamed_addr #0 {
entry:

tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias align 1 dereferenceable(16) %s, i8* noalias align 1 dereferenceable(16) %p, i64 16, i1 false)
%0 = load i8, i8* %s, align 1, !tbaa !2
%arrayidx = getelementptr inbounds i8, i8* %p, i64 3
store i8 %0, i8* %arrayidx, align 1, !tbaa !2
%1 = load i8, i8* %s, align 1, !tbaa !2
%arrayidx1 = getelementptr inbounds i8, i8* %p, i64 4
store i8 %1, i8* %arrayidx1, align 1, !tbaa !2
ret void

}

(and do you plan to propagate noalias too?)

@xbolva00
Thank you for your comment!

So with this patch and enabled Attributor I expect that arguments will be annotated with dereferenceable(16) (propagated from callsite), right?

Yes.

(and do you plan to propagate noalias too?)

Yes. I'll do it. I'm going to work on other attributes too but I want to generalize this kind of deduction. I mean, AbstractState in the same MustBeExecutedContext can share their (known) information.

Super! Thanks.

uenoku updated this revision to Diff 215186.EditedAug 14 2019, 12:19 PM

Refactor

  • Create accumulatePredOnMustBeExecutedContext as Attributor member function
  • Look at known information of Argument in the callsite argument updateImpl. It is not allowed to look at assumption in argument but I think it is sound to look at known information.
  • Use AbstractState operator to combine state.

This patch now fails Transforms/FunctionAttrs/read_write_returned_arguments_scc.lldue to Attributor verification.
I think this is because in some case, CallSiteArgumentAttribute reaches pessimistic fixpoint faster than the function ArgumentAttribute gets known information.

To introduce backward propagation of known information, we need to fix this problem.

jdoerfert requested changes to this revision.Aug 14 2019, 1:57 PM
jdoerfert added inline comments.
llvm/include/llvm/Transforms/IPO/Attributor.h
904

Could we call it StateType, make the std::function a function_ref, and PP a reference please.

Also, I think it makes sense to have an early exit. Maybe make the predicate a function_ref<bool(const Instruction &, State &)> which updates the state in place if necessary and returns true as long as it wants to continue exploring.

llvm/lib/Transforms/IPO/Attributor.cpp
564

Very cool, but make it static or put it in an anonymous namespace.

1725

I thought about this some more:

  1. Should we do this traversal for nonnull explicitly or just do it for deref and then have nonnull ask the deref attribute if nonnul is implied. (for now most use cases will not have null as a valid pointer)
  2. We should not traverse the context of the same instruction multiple times, or at least not in every update. What we could do is run this in the initialize and remember the AA's that would imply the current one (line 1202). In the update we then only check if any of the implied ones is *known* if not we try to determine it the usual way.

( &getAnchorScope().getEntryBlock().front(), will become getCtxI())

llvm/test/Transforms/InferFunctionAttrs/dereferenceable.ll
11

Could we have more versions of this test:

Positive:

  • only the access to %arrayidx3 should suffice
  • all accesses but without the inbounds keyword

Negative:

  • only the access to %arrayidx3 without the inbound keyword
This revision now requires changes to proceed.Aug 14 2019, 1:57 PM
xbolva00 added a comment.EditedAug 15 2019, 4:04 AM

Hello folks,

Yesterday I came up with idea to annonate allocation functions with deref_or_null.

malloc(40) -> annotate return value of malloc with deref_or_null(40) -> Good, I will work on this idea.

Now I present some ideas for Attributor (maybe following cases already work.. I didnt check them).

Use case 1:
p = malloc(40) deref_or_null(40)
p[1] = 100;
here, we can change deref_or_null(40) to deref(40) since if p was null, it is UB.

Use case 2:
p = malloc(40) // deref_or_null(40)
if (p) {

in this block p is now  deref(40)! cool.

}

Use case 3
p = malloc(40) // deref_or_null(40)
if (someotherbool) {

p[1] = 100; // here, we can change  deref_or_null(40) to  deref(40) since if p was null, it is UB.

}

What do you think? Can attributor do it? I think use case 1 is doable, 2 and 3 a bit harder ones.
Maybe @reames is interested too in this area? Please comment this idea :)

I think precise deref info can enable really neat things, see https://bugs.llvm.org/show_bug.cgi?id=43003.

Hello folks,

Yesterday I came up with idea to annonate allocation functions with deref_or_null.

malloc(40) -> annotate return value of malloc with deref_or_null(40) -> Good, I will work on this idea.

Now I present some ideas for Attributor (maybe following cases already work.. I didnt check them).

Use case 1:
p = malloc(40) deref_or_null(40)
p[1] = 100;
here, we can change deref_or_null(40) to deref(40) since if p was null, it is UB.

This should work with these patches, yes. (It will be deref(40) if nullptr is not a valid pointer else it is deref(2).)

Use case 2:
p = malloc(40) // deref_or_null(40)
if (p) {

in this block p is now  deref(40)! cool.

}

Yes, this will work at some point soon, it might already if the nonnull query is already using the "context instruction". However, there might not be a way to annotate the information. Passes will soon (after the series that ends in D66276) be able to build a Attributor and query a single property, e.g., the dereferenceability property of a pointer inside some control. We might also think about better dereferenceability annotations, maybe an intrinsic. This also ties in with the new deref and deref_globally (D61652).

Use case 3
p = malloc(40) // deref_or_null(40)
if (someotherbool) {

p[1] = 100; // here, we can change  deref_or_null(40) to  deref(40) since if p was null, it is UB.

}

Similar to above.

What do you think? Can attributor do it? I think use case 1 is doable, 2 and 3 a bit harder ones.

All are well in the scope of what we (will soon) have.

Maybe @reames is interested too in this area? Please comment this idea :)

I think precise deref info can enable really neat things, see https://bugs.llvm.org/show_bug.cgi?id=43003.

Once these patches are in we need to switch to deref and deref_globally (D61652). We should also consider using deref_globally (or sth similar) to indicate that all existing accesses will be dereferenceable, e.g.,
if we have an alloc or a malloc for which we can show it returns sth dereferenceable and the pointer is not used after the free.

Good to know, thanks for info!

Hi,
my another question :)
clang -std=c99 -Ofast -DNDEBUG -S -emit-llvm -mllvm -enable-nonnull-arg-prop -mllvm -attributor-disable=false code.c

define dso_local noalias %struct.tHashTableItem* @symbolNewCopy(%struct.tHashTableItem* nocapture readonly %symbol) local_unnamed_addr #2 {
entry:

%call = tail call noalias i8* @malloc(i64 64) #8
%0 = bitcast i8* %call to %struct.tHashTableItem*
%1 = bitcast %struct.tHashTableItem* %symbol to i8*
tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 dereferenceable(64) %call, i8* nonnull align 8 dereferenceable(64) %1, i64 64, i1 false)
%cmp = icmp eq i8* %call, null

Curently LLVM fails to propagate attributes to arguments thru bitcasts, it seems. With this patch help with this case?

xbolva00 added a comment.EditedAug 20 2019, 3:08 PM

Uhm.

Even simple case with "-mllvm -enable-nonnull-arg-prop" does not work.. strange. Hopefully with Attributor it would work.

define dso_local noalias %struct.tHashTableItem* @symbolNewCopy(i8* nocapture readonly %s) local_unnamed_addr #0 {
entry:

%call = tail call noalias i8* @calloc(i64 1, i64 64) #8
%call.i = tail call noalias i8* @calloc(i64 1, i64 64) #8
%Index.i = getelementptr inbounds i8, i8* %call.i, i64 24
%0 = bitcast i8* %Index.i to i32*
store i32 -1, i32* %0, align 8, !tbaa !14
%bcmp = tail call i32 @bcmp(i8* nonnull dereferenceable(7) %call.i, i8* nonnull dereferenceable(7) %s, i64 7)
uenoku updated this revision to Diff 216455.EditedAug 21 2019, 1:19 PM

Address some comments and change to structured deduction. I'll add new test mentioned in the comment later.

Hi,
my another question :)
clang -std=c99 -Ofast -DNDEBUG -S -emit-llvm -mllvm -enable-nonnull-arg-prop -mllvm -attributor-disable=false code.c

define dso_local noalias %struct.tHashTableItem* @symbolNewCopy(%struct.tHashTableItem* nocapture readonly %symbol) local_unnamed_addr #2 {
entry:

%call = tail call noalias i8* @malloc(i64 64) #8
%0 = bitcast i8* %call to %struct.tHashTableItem*
%1 = bitcast %struct.tHashTableItem* %symbol to i8*
tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 dereferenceable(64) %call, i8* nonnull align 8 dereferenceable(64) %1, i64 64, i1 false)
%cmp = icmp eq i8* %call, null

Curently LLVM fails to propagate attributes to arguments thru bitcasts, it seems. With this patch help with this case?

Uhm.

Even simple case with "-mllvm -enable-nonnull-arg-prop" does not work.. strange. Hopefully with Attributor it would work.

define dso_local noalias %struct.tHashTableItem* @symbolNewCopy(i8* nocapture readonly %s) local_unnamed_addr #0 {
entry:

%call = tail call noalias i8* @calloc(i64 1, i64 64) #8
%call.i = tail call noalias i8* @calloc(i64 1, i64 64) #8
%Index.i = getelementptr inbounds i8, i8* %call.i, i64 24
%0 = bitcast i8* %Index.i to i32*
store i32 -1, i32* %0, align 8, !tbaa !14
%bcmp = tail call i32 @bcmp(i8* nonnull dereferenceable(7) %call.i, i8* nonnull dereferenceable(7) %s, i64 7)

These cases are not so complex that I think this patch can handle.

uenoku marked 3 inline comments as done.Aug 21 2019, 1:35 PM

Happy to hear :)

uenoku updated this revision to Diff 216552.Aug 21 2019, 11:28 PM

Fix dererefenceable

uenoku updated this revision to Diff 216614.Aug 22 2019, 7:22 AM

Refactor

uenoku marked an inline comment as done.Aug 22 2019, 7:42 AM
uenoku added inline comments.
llvm/lib/Transforms/IPO/Attributor.cpp
1725

In the current patch, context is traversed only in initialize and collect *interesting* AA. In updateImpl, these AAs will be merged into the state.

uenoku updated this revision to Diff 216645.Aug 22 2019, 9:34 AM

Fix broken patch.

jdoerfert added inline comments.Aug 22 2019, 2:45 PM
llvm/lib/Transforms/IPO/Attributor.cpp
750

I have the feeling the logic you add here works not only for arguments but other positions too, right? If so, we should make it a helper class that can be reused. In the generic case you can use getIRPosition().getCtxI() for the program point.

I'm also still unsure if we should not somehow "filter" the explored instructions based on the uses of the associated value. Maybe we could do the following:

  • In accumulatePredOnMustBeExecutedContext you take another method that yields "interesting" instructions we would like to visit if they are in the must-be-executed-context (MBECtx).
  • The function for deref would look at the uses of the associated value and ask for the users. If the user was found, it can transitively ask for its user (if it wants to). If the user was not found, we do not go further (would require path exploration).
  • If the asked for user is not in the visited set of the MBECtxIterator, we visit more instructions until we explored the context or we found the instruction we are looking for.

Does this make sense?

uenoku updated this revision to Diff 217078.Aug 25 2019, 11:07 PM

Apply to call site argument or floating value also.

I still think we should try to look only at (transitive) uses of the associated value instead of visiting all instructions and figuring out for each if it is interesting. You know what I mean?

llvm/include/llvm/Analysis/ValueTracking.h
259 ↗(On Diff #217078)

You can commit this separatly. It looks good to me.

llvm/lib/Transforms/IPO/Attributor.cpp
707

Make this just StateType

758

Why can't we do this in the initialize?

uenoku marked an inline comment as done.Aug 26 2019, 9:23 PM

I still think we should try to look only at (transitive) uses of the associated value instead of visiting all instructions and figuring out for each if it is interesting. You know what I mean?

Tracking transitive uses is also good to me but I'm not sure there is an API to (efficiently) determine whether use of the associated value belongs to the context. Isn't it same to iterator over the instructions in the context?

llvm/lib/Transforms/IPO/Attributor.cpp
758

In the registerAA, AA on-demand created may be overwritten. I'm not sure whether it is intended or bug. Then, I wanted to avoid to store the address of AAs in initialize.

I still think we should try to look only at (transitive) uses of the associated value instead of visiting all instructions and figuring out for each if it is interesting. You know what I mean?

Tracking transitive uses is also good to me but I'm not sure there is an API to (efficiently) determine whether use of the associated value belongs to the context. Isn't it same to iterator over the instructions in the context?

Given a use U, check if the user is an instruction dyn_cast<Instruction>(U.getUser()) and ask the iterator if it is contained (It.count(...)). If the result is false we can try to explore further but let's start by exploring the context to the fullest and then doing the use & user stuff

Is this blocked on anything right now? I would really like this capability :)

uenoku added a comment.EditedSep 20 2019, 7:09 AM

I got it. I'll rebase it within a week :)

uenoku updated this revision to Diff 222008.Sep 26 2019, 12:28 PM

Rebase and change the way to track (transitive) uses. Some test changes are missing.

I like this version a lot better. Selectively looking into the context is the right choice (I think). I added some comments, when the tests are updated I'll give it a last look over but this looks almost ready to me.

llvm/lib/Transforms/IPO/Attributor.cpp
713

Please describe the type of followUse in more detail here. (what arguments, what do they mean, that the update logic should go in there, what not to do, etc.)

750

I would have though we need to advance the explorer iterator somewhere?

2486

This code looks like the other followUse. Can we have a helper like getKnownNonNullAndDerefBytesForUse which we call from both? (Maybe that also is reusable/combinable with the existing logic we have)

llvm/test/Transforms/FunctionAttrs/dereferenceable.ll
189

I think we lack a return value only test, see below.

define i32* @f7_3() {
; ATTRIBUTOR: define nonnull dereferenceable(4) i32* @f7_3()
  %ptr = tail call i32* @unkown_ptr()
  store i32 10, i32* %ptr, align 16
  ret i32* %ptr
}
uenoku updated this revision to Diff 222354.Sep 30 2019, 12:00 AM

Address comment

uenoku updated this revision to Diff 222355.Sep 30 2019, 12:01 AM

Minor update.

Hopefully last round of comments.

llvm/lib/Transforms/IPO/Attributor.cpp
751

This also only works for the first time the explorer iterator is created, the contains call before was the right way (as long as the explorer works the way it does now).

You can do sth like:

for (const Use *U : Uses) {
  const Instruction *UserI = cast<Instruction>(U->getUser());
  auto EIt = Explorer.begin(CtxI), EEnd = Explorer.end(CtxI);
  bool Found = EIt.contains(UserI);
  while (!Found && ++EIt != EEnd)
     Found = EIt.getCurrentInst() == UserI;
  if (Found && Base::followUse(A, U, UserI))
    for (const Use &Us : UserI->uses())
              Uses.insert(&Us);
}
1553

This is only true, I think, if null is not a valid pointer.

1566

Reading this line it was at first not clear to me that I has to be an access for this to return a base pointer.

1607

Do we want this change or was it just for debugging? Shouldn't it already say "nonnull [fix]" if it is known?

1635

Change is not used. You could check if it is fixed/known after this update and not do the stuff below in that case.

llvm/test/Transforms/FunctionAttrs/dereferenceable.ll
197

Is this derived or not? There is the same string after the FIXME and ATTRIBUTOR, right?

uenoku updated this revision to Diff 222626.Oct 1 2019, 8:16 AM
uenoku marked 8 inline comments as done.

Address comments.

llvm/lib/Transforms/IPO/Attributor.cpp
1566

What does it mean?

1607

It was just for debugging.

llvm/test/Transforms/FunctionAttrs/dereferenceable.ll
197

This is derived.

Nice, +1

Is this the last remaining work before you turn the attributor on?

jdoerfert added a comment.EditedOct 1 2019, 9:22 AM

One minor and one not so minor comment inlined. That is the last problem I found and once fixed this is fine :)

Nice, +1

Is this the last remaining work before you turn the attributor on?

No, tuning is still needed, right now we focused on applicability and aggressive optimizations. I fixed all bugs I found in the TS and SPEC2006 the other day but I'll have to do that again with more benchmarks soon. This patch is important to fix various issues, one of which is the dereferenceability problem we have (D61652). I'll work on tuning and measuring the Attributor over the next two weeks, present results at the Dev-Meeting, and hopefully I'm able to propose to turn it on by then.

llvm/lib/Transforms/IPO/Attributor.cpp
1566

That means we might want to add a comment here or rename getBasePointerOfPointerOperand to something like getBasePointerOfAccessPointerOperand.

1571

Don't we have to check if offset is negative? Do we have a test?

something like A[-2] = 0; should not cause deref bytes. E.g., we might need to use 64 bit and signed version of deref bytes. but non-null can be set as soon as we know there is an access and null is not a valid pointer (probably need tests for these as well).

I thought we have a helper somewhere to deal with this offset and access size logic already?

uenoku updated this revision to Diff 223189.Oct 4 2019, 5:03 AM

Rename method name.

uenoku updated this revision to Diff 223206.Oct 4 2019, 6:36 AM

Minor update.

uenoku updated this revision to Diff 223223.Oct 4 2019, 7:54 AM

Add test for minus index.

uenoku marked an inline comment as done.Oct 4 2019, 9:31 AM

Currently, any use is not tracked so nothing about A[1] or A[-2] is deduced.

This would be solved once making it track gep instruction. But beforehand, I strongly suggest separating deduction for known/assumption respectively.

jdoerfert accepted this revision.Oct 4 2019, 1:52 PM

Currently, any use is not tracked so nothing about A[1] or A[-2] is deduced.

As long as we have a test that is fine.

This would be solved once making it track gep instruction. But beforehand, I strongly suggest separating deduction for known/assumption respectively.

What do you mean by the separation part?

I inlined one comment, if you agree with that one and the proposed fix, the rest looks good to me. If not, let me know.

llvm/lib/Transforms/IPO/Attributor.cpp
1575

I'm still unsure about this logic, correct me if I'm wrong but we have:

`Base = Offset + I.getPointer()`

and we know due to the access I that there are D dereferenceable bytes with

`D = DL.getTypeStoreSize(getPointerOperand(I)->getType()->getPointerElementType());`

Now, deref from Base should be:

`max(0, D + Offset)`

which is the same we have AADereferenceableFloating::updateImpl, but with an offset in the other direction, I think.

llvm/test/Transforms/FunctionAttrs/dereferenceable.ll
199

I don't think we can deduce much about %p but only about the return, so I would expect

`FIXME: This should be define nonnull dereferenceable(8) i32* @test_for_minus_index(i32* nonnull %p)`

or an explanation why %p is deref.

This revision is now accepted and ready to land.Oct 4 2019, 1:52 PM

Currently, any use is not tracked so nothing about A[1] or A[-2] is deduced.

As long as we have a test that is fine.

This would be solved once making it track gep instruction. But beforehand, I strongly suggest separating deduction for known/assumption respectively.

What do you mean by the separation part?

I mean, running two different deduction scheme(known, assumption) might cause an unpredictable result.

define i32* @test_for_minus_index(i32* %p) {
  %q = getelementptr inbounds i32, i32* %p, i32 -2
  store i32 1, i32* %q
  ret i32* %q
}

AANonNullArgument is composed of AAArgumentFromCallSiteArguments, AAFromMustBeExecutedContext.
Assume that gep is tracked in followUse.

Iteration 1 :

  • AAFromMustBeExecutedContext will traverse uses of %p and prepare uses of %q for next iteration.
  • AAArgumentFromCallSiteArguments will call indicatePessimisticFixpoint because the function is not internal function.

AANonNullArgument has already reached to pessimistic fixpoint so nonnull won't be deduced.

This example is so simple that we can debug them but it is hard to debug more complex ones.

Currently, any use is not tracked so nothing about A[1] or A[-2] is deduced.

As long as we have a test that is fine.

This would be solved once making it track gep instruction. But beforehand, I strongly suggest separating deduction for known/assumption respectively.

What do you mean by the separation part?

I mean, running two different deduction scheme(known, assumption) might cause an unpredictable result.

define i32* @test_for_minus_index(i32* %p) {
  %q = getelementptr inbounds i32, i32* %p, i32 -2
  store i32 1, i32* %q
  ret i32* %q
}

AANonNullArgument is composed of AAArgumentFromCallSiteArguments, AAFromMustBeExecutedContext.
Assume that gep is tracked in followUse.

Iteration 1 :

  • AAFromMustBeExecutedContext will traverse uses of %p and prepare uses of %q for next iteration.
  • AAArgumentFromCallSiteArguments will call indicatePessimisticFixpoint because the function is not internal function.

AANonNullArgument has already reached to pessimistic fixpoint so nonnull won't be deduced.

This example is so simple that we can debug them but it is hard to debug more complex ones.

I see. You can explore uses exhaustively though that is a "local" solution to a general problem.
I think we need to keep known & assumed together but we should provide a way for AAs that have multiple deduction strategies to exhaust them seperatly, e.g, AAArgumentFromCallSiteArguments is known to have 2 schemes so we should track their "fixpoints" separate somehow.
In addition, or as an alternative, we could allow updates for AAs in a fixpoint if they opt-in to it. They would do so if they can improve based on known-information around them.

uenoku added a comment.Oct 7 2019, 1:12 AM

Currently, any use is not tracked so nothing about A[1] or A[-2] is deduced.

As long as we have a test that is fine.

This would be solved once making it track gep instruction. But beforehand, I strongly suggest separating deduction for known/assumption respectively.

What do you mean by the separation part?

I mean, running two different deduction scheme(known, assumption) might cause an unpredictable result.

define i32* @test_for_minus_index(i32* %p) {
  %q = getelementptr inbounds i32, i32* %p, i32 -2
  store i32 1, i32* %q
  ret i32* %q
}

AANonNullArgument is composed of AAArgumentFromCallSiteArguments, AAFromMustBeExecutedContext.
Assume that gep is tracked in followUse.

Iteration 1 :

  • AAFromMustBeExecutedContext will traverse uses of %p and prepare uses of %q for next iteration.
  • AAArgumentFromCallSiteArguments will call indicatePessimisticFixpoint because the function is not internal function.

AANonNullArgument has already reached to pessimistic fixpoint so nonnull won't be deduced.

This example is so simple that we can debug them but it is hard to debug more complex ones.

I see. You can explore uses exhaustively though that is a "local" solution to a general problem.
I think we need to keep known & assumed together but we should provide a way for AAs that have multiple deduction strategies to exhaust them seperatly, e.g, AAArgumentFromCallSiteArguments is known to have 2 schemes so we should track their "fixpoints" separate somehow.
In addition, or as an alternative, we could allow updates for AAs in a fixpoint if they opt-in to it. They would do so if they can improve based on known-information around them.

Looks good. Anyway, the current code maintains soundness ( "general" solution might not be reached) so I'll commit it if there is no problem.

uenoku updated this revision to Diff 223854.Oct 8 2019, 7:30 AM

Rebase.

uenoku updated this revision to Diff 223867.Oct 8 2019, 7:57 AM
uenoku marked 2 inline comments as done.

Address the last comments.

llvm/lib/Transforms/IPO/Attributor.cpp
1575

I agree.

llvm/test/Transforms/FunctionAttrs/dereferenceable.ll
199

It is my mistake. Fixed.

This revision was automatically updated to reflect the committed changes.

dereferenceable attribute is not added to the arguments?

define dso_local void @_Z3fooPaS_S_ii(i8* noalias nocapture writeonly %0, i8* noalias nocapture readnone %1, i8* noalias nocapture readonly %2, i32 %3, i32 %4) local_unnamed_addr #0 {

tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture nonnull writeonly align 1 dereferenceable(16) %0, i8* noalias nocapture nonnull readonly align 1 dereferenceable(16) %2, i64 16, i1 false) #2
ret void

}

I would expect

define dso_local void @_Z3fooPaS_S_ii(i8* noalias nocapture writeonly dereferenceable(16) %0, i8* noalias nocapture readnone %1, i8* noalias nocapture readonly dereferenceable(16) %2, i32 %3, i32 %4) local_unnamed_addr #0 {

tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture nonnull writeonly align 1 dereferenceable(16) %0, i8* noalias nocapture nonnull readonly align 1 dereferenceable(16) %2, i64 16, i1 false) #2
ret void

}

https://godbolt.org/z/if9rle

uenoku added a comment.EditedOct 9 2019, 8:59 AM

dereferenceable attribute is not added to the arguments?

define dso_local void @_Z3fooPaS_S_ii(i8* noalias nocapture writeonly %0, i8* noalias nocapture readnone %1, i8* noalias nocapture readonly %2, i32 %3, i32 %4) local_unnamed_addr #0 {

tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture nonnull writeonly align 1 dereferenceable(16) %0, i8* noalias nocapture nonnull readonly align 1 dereferenceable(16) %2, i64 16, i1 false) #2
ret void

}

I would expect

define dso_local void @_Z3fooPaS_S_ii(i8* noalias nocapture writeonly dereferenceable(16) %0, i8* noalias nocapture readnone %1, i8* noalias nocapture readonly dereferenceable(16) %2, i32 %3, i32 %4) local_unnamed_addr #0 {

tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture nonnull writeonly align 1 dereferenceable(16) %0, i8* noalias nocapture nonnull readonly align 1 dereferenceable(16) %2, i64 16, i1 false) #2
ret void

}

https://godbolt.org/z/if9rle

Actually, dereferenceable is added to the arguments if opt is executed for the above IR. https://godbolt.org/z/CItEJG

As far as I see the log, the problem is that InstCombine is located after Attributor Pass in -O3. Therefore, dereferenceable is not propagated.
Please take a look at the stderr output in https://godbolt.org/z/l_6aVE.

Solutions are to change the order or to run the Attributor several times.

This comment was removed by xbolva00.

Yeah, you probably want to run it multiple times.

@jdoerfert