Page MenuHomePhabricator

Top-Down FunctionAttrs propagation for noalias, dereferenceable and nonnull inference
Needs ReviewPublic

Authored by hfinkel on Jul 21 2014, 3:40 PM.

Details

Summary

This patch introduces a new module-level pass which does a top-down traversal of the call graph, inferring when possible for local functions, parameter attributes: noalias, dereferenceable and nonnull. It does this based on an examination of all call sites.

Inference on nonnull is straightforward (all callers must pass an argument we know is nonnull). noalias requires two things at all call sites: that the parameter not alias with any other parameter, and that it not have been captured prior to the call site. dereferenceability requires that we know the pointer is derefenceable at all call sites, obviously, but then we also need to get the size. We use the type size (which is fine b/c we've already verified that a load of that type is safe), but we also use getObjectSize to see if we can do better. We don't always know that the size that getObjectSize returns is valid (because the malloc could have failed, for example), but it will be valid if there is also an access, and isSafeToLoadUnconditionally checks for that.

There is a quirk about scheduling this in the current pass manager: Inference of noalias requires a capturing analysis which greatly benefits from running after the existing FunctionAttrs pass which will infer nocapture parameters. However, the existing FunctionAttrs pass is a CGSCC pass that runs in the main CGSCC pass manager after the inliner. But this being a top-down traversal of the call graph (and thus a module pass), and one that should run early, needs to run before the inliner's CGSCC pass manager. As a result, I think the best option is to insert an extra run of the regular FunctionAttrs pass just prior to this one, and these just prior to the main CGSCC pass manager. Since both preserve the call graph, it should not cause it to be recomputed.

I see no compile-time impact (either from the extra run on functionattrs or from this new pass). Compiling sqlite, for example, both passes take less than 0.1% of the total time.

Diff Detail

Event Timeline

hfinkel updated this revision to Diff 11722.Jul 21 2014, 3:40 PM
hfinkel retitled this revision from to Top-Down FunctionAttrs propagation for noalias, dereferenceable and nonnull inference.
hfinkel updated this object.
hfinkel edited the test plan for this revision. (Show Details)
hfinkel added reviewers: chandlerc, reames.
hfinkel added a subscriber: Unknown Object (MLST).
hfinkel updated this revision to Diff 11731.Jul 21 2014, 5:08 PM

Use isDereferenceablePointer (which checks things like argument's dereferenceable attribute) in addition to using isSafeToLoadUnconditionally (which is mostly useful for its local instruction scan). Also fixed up one of the test cases.

hfinkel updated this revision to Diff 11749.Jul 22 2014, 5:54 AM
hfinkel added a reviewer: nicholas.

Updated based on Nick's comments; we need to ensure a stable answer for SCCs (which the df_iterator won't do).

The regular SCC iterator returns the SCCs bottom-up (it uses Tarjan's Algorithm, and knows it has found the root of an SCC as it passes it on the way up -- it is not clear there is a better way). Because we can't collect SCCs in a purely top-down fashion anyway, I've just changed it to store those returned by the bottom-up SCC iterator in a vector and visit this stored vector in reverse order. This pass does not modify the call graph, so this method should be stable.

In the future, we can get a better answer by speculating the attributes within each SCC, and then removing them afterward if call sites are found where the attributes can't be proved. I'll implement this as follow-up work.

hfinkel updated this revision to Diff 11803.Jul 22 2014, 11:43 PM

Also added support for inferring the align attribute (which, post r213670, is now a useful thing to do).

reames edited edge metadata.Sep 10 2014, 10:59 AM

Overall, looks pretty reasonable. A few minor code comments inline.

I might suggest separating the addition of the patch and adding to the pass manager. In particular, adding the second run of the attribute analysis should probably be discussed more broadly.

lib/Transforms/IPO/FunctionAttrsTD.cpp
82

Stylistically, I'd prefer this separated into two checks. Your combining correctness and profitability criteria. Also, rather than "return MadeChange", just "return false" for the early exits. It makes the flow clearer.

91

Please rename MaxSize. It appears to actually be MaxKnownDeferenceable and the current name is highly confusing.

112

A comment here that reminds the reader the default state is fully speculative/optimistic would be helpful.

114

Do you need to iterate the uses, or can you iterate the users directly?

125

Shouldn't this be F?

131

A general point: this function is *way* too complex. Split it into reasonable helper functions. One arrangement would be:
struct ArgState
updateArgAtCallSite(...) {}
updateForCallSite(...) {}
identifyCallSites(...) {}
addIndicatedAtrributes(...) {}

157

Is this a correctness bug? Or a missed optimization? I can't tell from your FIXME.

170

Er, this doesn't look right. We've verified that loading from the pointer *up to* a certain number of bytes is safe. We have not said anything beyond that.

175

It looks like IsDerefernceable could become a function of MaxSize on the state object?

184

I don't understand why you need this check. Is it safe to record an align attribute less than the ABI alignment? If not, that seems slightly odd.

236

I'm not clear why this check is needed. Aren't you only iterating 'number of argument' times anyways?

249

The first is just an early exit, but what are the second two checks for? Are they functionally required?

  • Original Message -----

From: "Philip Reames" <listmail@philipreames.com>
To: hfinkel@anl.gov, chandlerc@gmail.com, nicholas@mxc.ca, listmail@philipreames.com
Cc: llvm-commits@cs.uiuc.edu
Sent: Wednesday, September 10, 2014 12:59:48 PM
Subject: Re: [PATCH] Top-Down FunctionAttrs propagation for noalias, dereferenceable and nonnull inference

Overall, looks pretty reasonable. A few minor code comments inline.

I might suggest separating the addition of the patch and adding to
the pass manager. In particular, adding the second run of the
attribute analysis should probably be discussed more broadly.

Thanks for the review!

I also need to revisit the way that the noalias inference is done because it is not quite right in this patch. Just checking alias(AI, BI) == NoAlias is not sufficient: a[i] and a[i+1] don't alias, but obviously pointers based on them could. I can really only do this when we have distinct sets of underlying objects (which makes several other things much simpler -- including the capture checking and the ramifications on the optimization pipeline).

-Hal

================
Comment at: lib/Transforms/IPO/FunctionAttrsTD.cpp:82
@@ +81,3 @@
+ bool MadeChange = false;
+ if (F.isDeclaration() || !F.hasLocalLinkage() || F.arg_empty() ||
+ F.use_empty())


Stylistically, I'd prefer this separated into two checks. Your
combining correctness and profitability criteria. Also, rather than
"return MadeChange", just "return false" for the early exits. It
makes the flow clearer.

================
Comment at: lib/Transforms/IPO/FunctionAttrsTD.cpp:91
@@ +90,3 @@
+ bool IsAlign;
+ uint64_t MaxSize, MaxAlign;
+


Please rename MaxSize. It appears to actually be
MaxKnownDeferenceable and the current name is highly confusing.

================
Comment at: lib/Transforms/IPO/FunctionAttrsTD.cpp:112
@@ +111,3 @@
+ SmallVector<AttrState, 16> ArgumentState;
+ ArgumentState.resize(F.arg_size());
+


A comment here that reminds the reader the default state is fully
speculative/optimistic would be helpful.

================
Comment at: lib/Transforms/IPO/FunctionAttrsTD.cpp:114
@@ +113,3 @@
+
+ for (Use &U : F.uses()) {
+ User *UR = U.getUser();


Do you need to iterate the uses, or can you iterate the users
directly?

================
Comment at: lib/Transforms/IPO/FunctionAttrsTD.cpp:125
@@ +124,3 @@
+ CallSite CS(cast<Instruction>(UR));
+ if (!CS.isCallee(&U))
+ return MadeChange;


Shouldn't this be F?

================
Comment at: lib/Transforms/IPO/FunctionAttrsTD.cpp:131
@@ +130,3 @@
+ Function::arg_iterator Arg = F.arg_begin();
+ for (unsigned i = 0, e = ArgumentState.size(); i != e; ++i,
++AI, ++Arg) {
+ // If we've already excluded this argument, ignore it.


A general point: this function is *way* too complex. Split it into
reasonable helper functions. One arrangement would be:
struct ArgState
updateArgAtCallSite(...) {}
updateForCallSite(...) {}
identifyCallSites(...) {}
addIndicatedAtrributes(...) {}

================
Comment at: lib/Transforms/IPO/FunctionAttrsTD.cpp:157
@@ +156,3 @@
+ ArgumentState[i].IsDereferenceable = false;
+ FIXME: isSafeToLoadUnconditionally does not understand
memset.
+
FIXME: We can use getObjectSize for most things, but
for mallocs


Is this a correctness bug? Or a missed optimization? I can't tell
from your FIXME.

================
Comment at: lib/Transforms/IPO/FunctionAttrsTD.cpp:170
@@ +169,3 @@
+ // trap, so we can use at least that size.
+ Size = std::max(Size, TypeSize);
+ }


Er, this doesn't look right. We've verified that loading from the
pointer *up to* a certain number of bytes is safe. We have not said
anything beyond that.

================
Comment at: lib/Transforms/IPO/FunctionAttrsTD.cpp:175
@@ +174,3 @@
+ if (!ArgumentState[i].MaxSize)
+ ArgumentState[i].IsDereferenceable = false;
+ }


It looks like IsDerefernceable could become a function of MaxSize on
the state object?

================
Comment at: lib/Transforms/IPO/FunctionAttrsTD.cpp:184
@@ +183,3 @@
+ unsigned Align = getKnownAlignment(*AI, DL);
+ if (Align > DL->getABITypeAlignment(ETy))
+ ArgumentState[i].MaxAlign =


I don't understand why you need this check. Is it safe to record an
align attribute less than the ABI alignment? If not, that seems
slightly odd.

================
Comment at: lib/Transforms/IPO/FunctionAttrsTD.cpp:236
@@ +235,3 @@
+
+ bool HaveCand = false;
+ for (unsigned i = 0, e = ArgumentState.size(); i != e; ++i)


I'm not clear why this check is needed. Aren't you only iterating
'number of argument' times anyways?

================
Comment at: lib/Transforms/IPO/FunctionAttrsTD.cpp:249
@@ +248,3 @@
+ for (unsigned i = 0, e = ArgumentState.size(); i != e; ++i, ++AI)
{
+ if (!ArgumentState[i].any() || AI->hasInAllocaAttr() ||
AI->hasByValAttr())
+ continue;


The first is just an early exit, but what are the second two checks
for? Are they functionally required?

http://reviews.llvm.org/D4609

  • Original Message -----

From: "Hal Finkel" <hfinkel@anl.gov>
To: reviews+D4609+public+b1f539b8c63822a5@reviews.llvm.org
Cc: llvm-commits@cs.uiuc.edu
Sent: Wednesday, September 10, 2014 1:13:01 PM
Subject: Re: [PATCH] Top-Down FunctionAttrs propagation for noalias, dereferenceable and nonnull inference

  • Original Message ----- > From: "Philip Reames" <listmail@philipreames.com> > To: hfinkel@anl.gov, chandlerc@gmail.com, nicholas@mxc.ca, > listmail@philipreames.com > Cc: llvm-commits@cs.uiuc.edu > Sent: Wednesday, September 10, 2014 12:59:48 PM > Subject: Re: [PATCH] Top-Down FunctionAttrs propagation for > noalias, dereferenceable and nonnull inference > > Overall, looks pretty reasonable. A few minor code comments > inline. > > I might suggest separating the addition of the patch and adding to > the pass manager. In particular, adding the second run of the > attribute analysis should probably be discussed more broadly.

    Thanks for the review!

    I also need to revisit the way that the noalias inference is done because it is not quite right in this patch. Just checking alias(AI, BI) == NoAlias is not sufficient: a[i] and a[i+1] don't alias, but obviously pointers based on them could. I can really only do this when we have distinct sets of underlying objects (which makes several other things much simpler -- including the capture checking and the ramifications on the optimization pipeline).

Actually, I take that back. It does not make the capture checking simpler.

-Hal

-Hal

>
> ================
> Comment at: lib/Transforms/IPO/FunctionAttrsTD.cpp:82
> @@ +81,3 @@
> + bool MadeChange = false;
> + if (F.isDeclaration() || !F.hasLocalLinkage() || F.arg_empty()
> ||
> + F.use_empty())
> ----------------
> Stylistically, I'd prefer this separated into two checks. Your
> combining correctness and profitability criteria. Also, rather
> than
> "return MadeChange", just "return false" for the early exits. It
> makes the flow clearer.
>
> ================
> Comment at: lib/Transforms/IPO/FunctionAttrsTD.cpp:91
> @@ +90,3 @@
> + bool IsAlign;
> + uint64_t MaxSize, MaxAlign;
> +
> ----------------
> Please rename MaxSize. It appears to actually be
> MaxKnownDeferenceable and the current name is highly confusing.
>
> ================
> Comment at: lib/Transforms/IPO/FunctionAttrsTD.cpp:112
> @@ +111,3 @@
> + SmallVector<AttrState, 16> ArgumentState;
> + ArgumentState.resize(F.arg_size());
> +
> ----------------
> A comment here that reminds the reader the default state is fully
> speculative/optimistic would be helpful.
>
> ================
> Comment at: lib/Transforms/IPO/FunctionAttrsTD.cpp:114
> @@ +113,3 @@
> +
> + for (Use &U : F.uses()) {
> + User *UR = U.getUser();
> ----------------
> Do you need to iterate the uses, or can you iterate the users
> directly?
>
> ================
> Comment at: lib/Transforms/IPO/FunctionAttrsTD.cpp:125
> @@ +124,3 @@
> + CallSite CS(cast<Instruction>(UR));
> + if (!CS.isCallee(&U))
> + return MadeChange;
> ----------------
> Shouldn't this be F?
>
> ================
> Comment at: lib/Transforms/IPO/FunctionAttrsTD.cpp:131
> @@ +130,3 @@
> + Function::arg_iterator Arg = F.arg_begin();
> + for (unsigned i = 0, e = ArgumentState.size(); i != e; ++i,
> ++AI, ++Arg) {
> + If we've already excluded this argument, ignore it.
> ----------------
> A general point: this function is *way* too complex. Split it into
> reasonable helper functions. One arrangement would be:
> struct ArgState
> updateArgAtCallSite(...) {}
> updateForCallSite(...) {}
> identifyCallSites(...) {}
> addIndicatedAtrributes(...) {}
>
> ================
> Comment at: lib/Transforms/IPO/FunctionAttrsTD.cpp:157
> @@ +156,3 @@
> + ArgumentState[i].IsDereferenceable = false;
> +
FIXME: isSafeToLoadUnconditionally does not
> understand
> memset.
> + FIXME: We can use getObjectSize for most things, but
> for mallocs
> ----------------
> Is this a correctness bug? Or a missed optimization? I can't tell
> from your FIXME.
>
> ================
> Comment at: lib/Transforms/IPO/FunctionAttrsTD.cpp:170
> @@ +169,3 @@
> +
trap, so we can use at least that size.
> + Size = std::max(Size, TypeSize);
> + }
> ----------------
> Er, this doesn't look right. We've verified that loading from the
> pointer *up to* a certain number of bytes is safe. We have not
> said
> anything beyond that.
>
> ================
> Comment at: lib/Transforms/IPO/FunctionAttrsTD.cpp:175
> @@ +174,3 @@
> + if (!ArgumentState[i].MaxSize)
> + ArgumentState[i].IsDereferenceable = false;
> + }
> ----------------
> It looks like IsDerefernceable could become a function of MaxSize
> on
> the state object?
>
> ================
> Comment at: lib/Transforms/IPO/FunctionAttrsTD.cpp:184
> @@ +183,3 @@
> + unsigned Align = getKnownAlignment(*AI, DL);
> + if (Align > DL->getABITypeAlignment(ETy))
> + ArgumentState[i].MaxAlign =
> ----------------
> I don't understand why you need this check. Is it safe to record
> an
> align attribute less than the ABI alignment? If not, that seems
> slightly odd.
>
> ================
> Comment at: lib/Transforms/IPO/FunctionAttrsTD.cpp:236
> @@ +235,3 @@
> +
> + bool HaveCand = false;
> + for (unsigned i = 0, e = ArgumentState.size(); i != e; ++i)
> ----------------
> I'm not clear why this check is needed. Aren't you only iterating
> 'number of argument' times anyways?
>
> ================
> Comment at: lib/Transforms/IPO/FunctionAttrsTD.cpp:249
> @@ +248,3 @@
> + for (unsigned i = 0, e = ArgumentState.size(); i != e; ++i,
> ++AI)
> {
> + if (!ArgumentState[i].any() || AI->hasInAllocaAttr() ||
> AI->hasByValAttr())
> + continue;
> ----------------
> The first is just an early exit, but what are the second two checks
> for? Are they functionally required?
>
> http://reviews.llvm.org/D4609
>
>
>

reames removed a reviewer: reames.Dec 29 2014, 6:22 PM
Prazek added a subscriber: Prazek.May 8 2017, 5:38 AM

Did you push it to master?

Hi Hal,

I am wondering what's your plan of this patch? I am particularly interested in the propagation of noalias attributes.

Thank you,

Haicheng

Hi Hal,

I am wondering what's your plan of this patch? I am particularly interested in the propagation of noalias attributes.

I've not thought about this in a while, but I'd certainly like to see this functionality. Aside from some refactoring, it needs two things:

  1. The noalias inference needs to be updated to be correct. The easiest way to do this is, instead of actually doing an alias check on the arguments, to call GetUnderlyingObjects and check for disjoint sets of *identified* underlying objects (still checking for capturing, and possibly restricting to local identified objects unless we can do a global capture check). Any set that is disjoint from the others means that its associated argument can be marked as noalias.
  2. I'd like to revisit how the iteration is done and how this fits with the rest of the pass manager. While it does naturally iterate top down, and the CGSCC pass manager normally iterates the other way, there should be a way to put this logic into the part of the pipeline that is iterating with the inliner. I was never satisfied with the idea that we'd only do this once at the beginning of the pipeline. Now that we have a new pass manager, we can think about how this might work in that framework as well. Also, the new pass manager should make it more natural to get a per-function dominator tree, etc. @chandlerc, thoughts on how this might best be done?

Thank you,

Haicheng

Hi Hal,

I am wondering what's your plan of this patch? I am particularly interested in the propagation of noalias attributes.

I've not thought about this in a while, but I'd certainly like to see this functionality. Aside from some refactoring, it needs two things:

  1. The noalias inference needs to be updated to be correct. The easiest way to do this is, instead of actually doing an alias check on the arguments, to call GetUnderlyingObjects and check for disjoint sets of *identified* underlying objects (still checking for capturing, and possibly restricting to local identified objects unless we can do a global capture check). Any set that is disjoint from the others means that its associated argument can be marked as noalias.
  2. I'd like to revisit how the iteration is done and how this fits with the rest of the pass manager. While it does naturally iterate top down, and the CGSCC pass manager normally iterates the other way, there should be a way to put this logic into the part of the pipeline that is iterating with the inliner. I was never satisfied with the idea that we'd only do this once at the beginning of the pipeline. Now that we have a new pass manager, we can think about how this might work in that framework as well. Also, the new pass manager should make it more natural to get a per-function dominator tree, etc. @chandlerc, thoughts on how this might best be done?

Thank you,

Haicheng

That is great. I rebased your code and replaced the alias check with cheking distinct sets of underlying objects. It is working for most benchmarks of llvm-test-suite and spec20xx.

Hi Hal,

I am wondering what's your plan of this patch? I am particularly interested in the propagation of noalias attributes.

I've not thought about this in a while, but I'd certainly like to see this functionality. Aside from some refactoring, it needs two things:

  1. The noalias inference needs to be updated to be correct. The easiest way to do this is, instead of actually doing an alias check on the arguments, to call GetUnderlyingObjects and check for disjoint sets of *identified* underlying objects (still checking for capturing, and possibly restricting to local identified objects unless we can do a global capture check). Any set that is disjoint from the others means that its associated argument can be marked as noalias.
  2. I'd like to revisit how the iteration is done and how this fits with the rest of the pass manager. While it does naturally iterate top down, and the CGSCC pass manager normally iterates the other way, there should be a way to put this logic into the part of the pipeline that is iterating with the inliner. I was never satisfied with the idea that we'd only do this once at the beginning of the pipeline. Now that we have a new pass manager, we can think about how this might work in that framework as well. Also, the new pass manager should make it more natural to get a per-function dominator tree, etc. @chandlerc, thoughts on how this might best be done?

Thank you,

Haicheng

That is great. I rebased your code and replaced the alias check with cheking distinct sets of underlying objects. It is working for most benchmarks of llvm-test-suite and spec20xx.

It occurs to me that the need for this might be removed by another potential improvement (suggested to me by Chandler at EuroLLVM this year): We can make CGSCC analysis wrapper for our current analysis passes in order to allow these analyses to look back through function arguments into the function's callers. To make this work, we'd:

  1. Make CGSCC analysis passes corresponding to our current analysis passes. For example, AA, ValueTracking, etc. (I realize that ValueTracking is not really an analysis pass right now, and while we might want to change that at some point, I don't think it matters for this description).
  2. These pass accept analysis queries and produce results by looking at the calling functions and calling the function/local analysis on the values at each call site, and then intersecting the results. If there are too many call sites, or unknown call sites, then you need to give up.
  3. Provide these CGSCC analysis handles to the corresponding local analyses so that, as they do a recursive analysis, when they hit arguments, then can call into the CGSCC analyses to continue the analysis into the callers.

If we did that, do you think we'd still need this transformation?

It occurs to me that the need for this might be removed by another potential improvement (suggested to me by Chandler at EuroLLVM this year): We can make CGSCC analysis wrapper for our current analysis passes in order to allow these analyses to look back through function arguments into the function's callers. To make this work, we'd:

  1. Make CGSCC analysis passes corresponding to our current analysis passes. For example, AA, ValueTracking, etc. (I realize that ValueTracking is not really an analysis pass right now, and while we might want to change that at some point, I don't think it matters for this description).
  2. These pass accept analysis queries and produce results by looking at the calling functions and calling the function/local analysis on the values at each call site, and then intersecting the results. If there are too many call sites, or unknown call sites, then you need to give up.
  3. Provide these CGSCC analysis handles to the corresponding local analyses so that, as they do a recursive analysis, when they hit arguments, then can call into the CGSCC analyses to continue the analysis into the callers.

    If we did that, do you think we'd still need this transformation?

What you described was close to what I was looking for before I found this patch. I don't know how difficult to implement it, but I can try to hook CGSCC with BasicAA first.

Thank you.

Haicheng

etherzhhb added inline comments.
lib/Transforms/IPO/FunctionAttrsTD.cpp
319

This is unused?

It occurs to me that the need for this might be removed by another potential improvement (suggested to me by Chandler at EuroLLVM this year): We can make CGSCC analysis wrapper for our current analysis passes in order to allow these analyses to look back through function arguments into the function's callers. To make this work, we'd:

  1. Make CGSCC analysis passes corresponding to our current analysis passes. For example, AA, ValueTracking, etc. (I realize that ValueTracking is not really an analysis pass right now, and while we might want to change that at some point, I don't think it matters for this description).
  2. These pass accept analysis queries and produce results by looking at the calling functions and calling the function/local analysis on the values at each call site, and then intersecting the results. If there are too many call sites, or unknown call sites, then you need to give up.
  3. Provide these CGSCC analysis handles to the corresponding local analyses so that, as they do a recursive analysis, when they hit arguments, then can call into the CGSCC analyses to continue the analysis into the callers.

    If we did that, do you think we'd still need this transformation?

Hi Hal,

I spent some time thinking about the alternative method you mentioned here. I think the alternative can cover many cases that this patch tries to solve, but this patch still could bring some additional advantages.

  1. The new pass provides a place dedicated to do the Top-Down propagation. It is simpler and easy to maintain. We can add more attributes to propagate in the future.
  2. Not only the direct users of AA or value tracking can get benefits from this patch, any pass runs after this new pass can easily get access to the propagated attributes.
  3. The alternative one has a long way to go, but your patch is almost there. I spend some time fixing the bugs and it is working for most of cases now.

I am thinking maybe the two methods can co-exist. This patch can cut the number of recursive calls and together they can cover more cases. Do you think it is worthwhile if I continue working on this patch?