This is an archive of the discontinued LLVM Phabricator instance.

[Attributor] Use AAValueConstantRange to infer dereferencability.
ClosedPublic

Authored by kuter on Mar 15 2020, 9:00 PM.

Diff Detail

Event Timeline

kuter created this revision.Mar 15 2020, 9:00 PM
jdoerfert added inline comments.Mar 15 2020, 9:56 PM
llvm/lib/Transforms/IPO/Attributor.cpp
616 ↗(On Diff #250458)

Isn't this almost the stripAndAccumulate code? How hard would it be to pass some kind of AttributorInfo object to that function (optionally) which triggers the lookup. Or more generically, a callback that takes a Value and returns a lower bound offset (plus indicates success/failure).

llvm/test/Transforms/Attributor/dereferenceable-1.ll
234

Isn't this the fill_range code?

Thank you for working on this!!

As a high-level comment, please add full-context of diff(diff -U99999)

llvm/lib/Transforms/IPO/Attributor.cpp
511 ↗(On Diff #250458)

Add commnent here

llvm/test/Transforms/Attributor/dereferenceable-1.ll
214

I think optimaly %ptr is dereferenceable(30).
Please add FIXME here.

kuter updated this revision to Diff 251274.Mar 18 2020, 10:09 PM
kuter marked an inline comment as done.

Made the changes @jdoerfert asked for.

kuter added a comment.EditedMar 19 2020, 4:41 AM

@jdoerfert

This patch is not complete yet.

Currently it can only use the known information. I would like to make it possible for it to use the assumed information as well.
The problem with that is:

We need to iterate over the uses and take the maximum of the current dereferancability that is indicated by that use.
this works fine for the known info. But since the lower bound of the assumed constant range can decrease over time,
taking max with existing state is a problem.

The way that the AAFromMustBeExecutedContext is wired makes things a bit tricky and
I would like to get your opinions on this before I proceed.

bbn added a subscriber: bbn.Mar 19 2020, 6:53 AM

Can you explain why we need to catch overflows now but not before? I mean, the values determine by the external analysis are valid lower bounds, what is different from versioning on them and making them constant in one of the versions.

@jdoerfert

This patch is not complete yet.

Currently it can only use the known information. I would like to make it possible for it to use the assumed information as well.
The problem with that is:

We need to iterate over the uses and take the maximum of the current dereferancability that is indicated by that use.
this works fine for the known info. But since the lower bound of the assumed constant range can decrease over time,
taking max with existing state is a problem.

The way that the AAFromMustBeExecutedContext is wired makes things a bit tricky and
I would like to get your opinions on this before I proceed.

I think we should finish this one and then do the following:

We actually don't want the minimal offset but the maximal known offset, right? In addition to renaming the function we will check if the range from AAConstantRange is known, if so, we take the maximum instead.

llvm/include/llvm/IR/Operator.h
563

= nullptr is not sufficient? unfortunate.
Nit: -value +Offset, or remove the names.

Please describe what external analysis exactly does here.

I like the solution of adding this callback though. We need more of these soon.

llvm/lib/IR/Value.cpp
647

Put the entire thing in an if (ExternalAnalysis) please. Or better !ExternaAnalysis then old code, else this code.
TBH, I'm not sure why we don't bail on overflow in the old code as well, maybe we should. Can you check if that would make any test fail?

llvm/lib/Transforms/IPO/Attributor.cpp
552 ↗(On Diff #251274)

Style: UseAssumed, Range, Value, etc.

kuter marked 3 inline comments as done.Mar 19 2020, 9:36 PM
kuter added inline comments.
llvm/test/Transforms/Attributor/dereferenceable-1.ll
214

OK, I will.

For us to be able to deduce a 30, there would have to be separate mechanism that deduces the biggest number
that the "Value" in question is going to be guaranteed to take.

Perhaps we could use SCEV ?

234

No it is not. Currently this patch does not work with for loops.

This is because with for loops AAFromMustBeExecutedContext looks at the branch at the top;
calls the followUse() on both of the successors and it "and"s them together.

This is probably the reason why only a single test is affected by this patch.

I will address this issue with a separate patch that passes LoopInfo, DominatorTree and the PostDominatorTree
to the MustBeExecutedContextExplorer.

kuter added a comment.EditedMar 19 2020, 10:16 PM

Can you explain why we need to catch overflows now but not before? I mean, the values determine by the external analysis are valid lower bounds, what is different from versioning on them and making them constant in one of the versions.

@jdoerfert

This patch is not complete yet.

Currently it can only use the known information. I would like to make it possible for it to use the assumed information as well.
The problem with that is:

We need to iterate over the uses and take the maximum of the current dereferancability that is indicated by that use.
this works fine for the known info. But since the lower bound of the assumed constant range can decrease over time,
taking max with existing state is a problem.

The way that the AAFromMustBeExecutedContext is wired makes things a bit tricky and
I would like to get your opinions on this before I proceed.

I think we should finish this one and then do the following:

We actually don't want the minimal offset but the maximal known offset, right? In addition to renaming the function we will check if the range from AAConstantRange is known, if so, we take the maximum instead.

void internal test(int index, char *ptrArg) {
  ptrArg[index] = 'A';
}

test(10, ptrA);
test(30, ptrB);
test(40, ptrC);

Based on call sites the index argument of the test function would be in range [10, 40] right ?
but marking ptrArg dereferancable(41) would be wrong woudn't it ?

The reason that I made the stripAndAccumulateConstantOffsets overflow aware is that the range can be lower than what it can be in reality.
I thought that these differences can result in unintended underflows.

For accumulateConstantOffset, I think for non inbound GEP's overflows should be ok.
but for stripAndAccumulateConstantOffsets bailing out should probably be the default behaviour.

Can you explain why we need to catch overflows now but not before? I mean, the values determine by the external analysis are valid lower bounds, what is different from versioning on them and making them constant in one of the versions.

@jdoerfert

This patch is not complete yet.

Currently it can only use the known information. I would like to make it possible for it to use the assumed information as well.
The problem with that is:

We need to iterate over the uses and take the maximum of the current dereferancability that is indicated by that use.
this works fine for the known info. But since the lower bound of the assumed constant range can decrease over time,
taking max with existing state is a problem.

The way that the AAFromMustBeExecutedContext is wired makes things a bit tricky and
I would like to get your opinions on this before I proceed.

I think we should finish this one and then do the following:

We actually don't want the minimal offset but the maximal known offset, right? In addition to renaming the function we will check if the range from AAConstantRange is known, if so, we take the maximum instead.

void internal test(int index, char *ptrArg) {
  ptrArg[index] = 'A';
}

test(10, ptrA);
test(30, ptrB);
test(40, ptrC);

Based on call sites the index argument of the test function would be in range [10, 40] right ?
but marking ptrArg dereferancable(41) would be wrong woudn't it ?

Correct. First though was: we need something special for the loop case so we know we reach the upper bound. After all, that is probably the most important case to get.

The reason that I made the stripAndAccumulateConstantOffsets overflow aware is that the range can be lower than what it can be in reality.
I thought that these differences can result in unintended underflows.

I see. Agreed.

For accumulateConstantOffset, I think for non inbound GEP's overflows should be ok.
but for stripAndAccumulateConstantOffsets bailing out should probably be the default behaviour.

OK.

llvm/test/Transforms/Attributor/dereferenceable-1.ll
234

Right. That reminds me of D64974, though I'm unsure if we still need it. What we need is to make the explorer aware of CanProveNotTakenFirstIteration. Similar to the use of that function that exists, we can go to the non-exit block from a header in getMustBeExecutedNextInstruction (or findForwardJoinPoint) if we know that edge is taken at least once. To not loose the code after the loop we should add a stack of unexplored edges. The exit edge goes there if the loop is known not to be endless (see findForwardJoinPoint). If we are out of forward instructions we can pop an edge from the stack and continue. Alternatively, we could check if the loop was not endless when we visit the header for the second time.

kuter updated this revision to Diff 251822.EditedMar 20 2020, 10:11 PM

Fixed styling, Added FIXME, Sperated new code.

kuter marked 2 inline comments as done.Mar 20 2020, 10:13 PM
kuter marked an inline comment as done.Mar 20 2020, 10:16 PM
uenoku added inline comments.Mar 21 2020, 4:19 AM
llvm/test/Transforms/Attributor/willreturn.ll
513

Why can't we get dereferenceable(4) for p in this case?

kuter marked an inline comment as done.Mar 21 2020, 5:40 AM
kuter added inline comments.
llvm/test/Transforms/Attributor/willreturn.ll
513

Ok for this example specifically, the variable n can be negative so the ans +=p[n] may never be executed.

But in general, this patch doesn't work with loops that are not do {} while (cond)

This is because AAFromMustBeExecutedContext is not aware when the first iteration is always going to be ran.

This needs to be addressed with a separate path and would probably improve many other deductions as well.

baziotis added inline comments.Mar 21 2020, 6:13 AM
llvm/test/Transforms/Attributor/willreturn.ll
513

IIUC, loop rotation can help here because it provides this guarantee.

uenoku added inline comments.Mar 21 2020, 7:32 AM
llvm/test/Transforms/Attributor/willreturn.ll
513

Oh, I thought %n is assumed to be positive:) Thanks.

uenoku added inline comments.Mar 21 2020, 7:46 AM
llvm/test/Transforms/Attributor/dereferenceable-1.ll
214

For us to be able to deduce a 30, there would have to be separate mechanism that deduces the biggest number

that the "Value" in question is going to be guaranteed to take.
Agreed.

Perhaps we could use SCEV ?

I think so, what I thought once is:
Assume that a loop is in the context and it is guaranteed to proceed to the last iteration, we can use the biggest number of the value. For inbounds gep, it is allowed to take that number as dereferenceable bytes, and for non-inbounds gep, if SCEV of the value is something like <0, +, 1>, we can take the biggest number as derefrenceable bytes.

This is because AAFromMustBeExecutedContext is not aware when the first iteration is always going to be ran.
This needs to be addressed with a separate path and would probably improve many other deductions as well.

Agreed and agreed.

Perhaps we could use SCEV ?

[...]

We could. What we want (regardless of how) is to track in AAConstantRange (conceptually) two ranges: (1) potential value range, (2) known value range. This is not the same as the assumed/known we track now. Right now bot track the potential value range, that is what value can this llvm::Value potentially have at runtime. I think tracking the second, thus, what value is this llvm::Value going to have at runtime (if it is executed) could be tracked in the same AA or a different one, depending on how much logic we can share. This is also a follow up patch (or two).


I'll have to go through the logic in this patch tomorrow (=rested) but I think it looks pretty good.

kuter added a comment.EditedMar 23 2020, 11:13 PM

We could. What we want (regardless of how) is to track in AAConstantRange (conceptually) two ranges: (1) potential value range, (2) known value range. This is not the same as the assumed/known we track now. Right now bot track the potential value range, that is what value can this llvm::Value potentially have at runtime. I think tracking the second, thus, what value is this llvm::Value going to have at runtime (if it is executed) could be tracked in the same AA or a different one, depending on how much logic we can share. This is also a follow up patch (or two).

Agreed.

Currently AAConstantRange tracks the range that a "llvm::value" is guaranteed to be within in runtime.
But for loops, this is results in information loss.

So I think what you suggest is that we track a specific range for loop like situations.
for an actual loop this would be, the first and the last index of the counter.

and when we are propagating this range from phi nodes, select instructions, call sites (of internal functions) we would intersect them together.

For example:

void test_use(char  *ptrA, char *ptrB) {
  for (int i = 10; i < 100; i++) {
    test(ptrA, i); //i is in [10, 99] loop range.
  }
  for (int i = 5; i < 150; i++) {
    test(ptrB, i); //i is in [5, 149] loop range.
  }
}

//i is in [10, 99] loop range
//ptr is dereferencable(100)
void internal test(char *ptr, int i) {
  ptr[i] = 'A';
}

We should finish this review and then focus on the next steps. I added more comments but I think the general logic is fine.

(Next step would be to rename the range in AAConstantRange to PossibleRange, or similar, and add a second range, e.g., ObservedRange which we need to deduce. We can then use the max(PossibleRange.minimum(), ObservedRange.maximum()) for dereferenceable deduction.

llvm/lib/IR/Operator.cpp
94

Style: I think it would be easier to read if you split the above if-else-cascade and use early exits:

 MinimalIndex = ConstOffset->getValue();
 continue;
}

// The operand is not constant, check if an external analysis was provided. 
if (!ExternalAnalysis)
  return false;

Do we need to track overflow *before* we use an external analysis? If not we don't need 75/76. Below we can than use UsedExternalAnalysis in 88 and remove it from 95. We should also be able to use a single overflowed flag.

112

Put the simple case first, maybe add a continue to avoid the "else"

llvm/lib/IR/Value.cpp
652

Simple case first please.

llvm/lib/Transforms/IPO/Attributor.cpp
552 ↗(On Diff #251822)

Some names are still starting with a lower case letter.

Do we have a test where the assumed range minimum is negative?

3749 ↗(On Diff #251822)

Merge in a single debug message and line.

3803 ↗(On Diff #251822)

No need for the newline in the beginning. TBH, the entire message doesn't help much given that we see the update debug message already.

kuter updated this revision to Diff 252574.Mar 25 2020, 7:44 AM

Eliminated redundant debug messages, Style fixes, Added negative test case,
Don't use external analysis if the get operand is a struct type.

kuter marked an inline comment as done.Mar 25 2020, 8:22 AM
kuter added inline comments.
llvm/lib/IR/Operator.cpp
94

We probably don't need to detect a overflow before the use of external analysis I was just being safe.

I am not sure what you mean by the early exit change.
We need to detect overflows that happen after we use ExternalAnalysis even if it was not caused by the value that the ExternalAnalysis have returned.

So the value of the ConstantOffset needs to pass through something that would detect a overflow if ExternalAnalysis is present and it has been used.

We could do it with a lambda like this:

  AddArrayIndex(ConstOffset->getValue());
  continue;
}

Also do we really need to have the old and the new code in this style:

if (!External Analysis) {
  //old
} else {
 //new
}

New code shouldn't really behave any different other than detecting overflows/underflows.

kuter marked 2 inline comments as done.Mar 25 2020, 9:59 AM
jdoerfert added inline comments.Mar 25 2020, 10:10 AM
llvm/lib/IR/Operator.cpp
94
  1. When we want to check UsedExternal in order to determine if an overflow check is necessary not ExternalAnalysis.
  2. We should create the lambda you mentioned.
  3. Don't track overflowed outside of the lambda. In the lambda check if usedexternal is set, only if we track and act on overflows.
kuter updated this revision to Diff 252668.Mar 25 2020, 2:13 PM
kuter marked an inline comment as done.

Simplfy accumulateConstantOffset

kuter marked 3 inline comments as done.Mar 25 2020, 2:15 PM
kuter marked an inline comment as done.Mar 26 2020, 6:32 AM
jdoerfert accepted this revision.Mar 26 2020, 9:35 AM

Two more minor comments, other than that the code looks good. Please update so I can commit it.

llvm/lib/IR/Operator.cpp
81

Nit: Don't call it MinimalIndex as there might be use cases that over-approximate the number. Index is just fine I think.

llvm/lib/Transforms/IPO/Attributor.cpp
566 ↗(On Diff #252668)

Nit: AllowNonInbounds and `Value above. Check other variable names as well.

This revision is now accepted and ready to land.Mar 26 2020, 9:35 AM
kuter updated this revision to Diff 253022.Mar 26 2020, 5:28 PM

Apologies for the delay, can you rebase this and provide me with "Firstname Lastname <email>" from you so I can attribute it to you?

kuter updated this revision to Diff 256549.Apr 10 2020, 4:31 AM

Rebased.
Small logic change in GEPOperator::accumulateConstantOffset to bailout on scalable vector types
except for when the offset is zero.
Not allowing zero breaks @test_accumulate_constant_offset_vscale_zero

see https://reviews.llvm.org/rGef64ba831194c7deac8882a325ea9bea64eb612a

kuter added a comment.EditedApr 10 2020, 4:34 AM

Apologies for the delay, can you rebase this and provide me with "Firstname Lastname <email>" from you so I can attribute it to you?

name, surname: Kuter Dinel
email: kuterdinel@gmail.com

This revision was automatically updated to reflect the committed changes.