This is an archive of the discontinued LLVM Phabricator instance.

Add CalledValuePropagation pass
ClosedPublic

Authored by mssimpso on Aug 31 2017, 2:09 PM.

Details

Summary

This patch adds a new transformation for propagating values used for indirect calls. The idea for this pass came from the review of D36432. The pass performs an IPSCCP-like analysis, propagating functions to indirect call sites and then attaching metadata to the call sites indicating the set of functions they may potentially target at run-time.

This metadata may be consumed by later transformations. My initial plan is to use the metadata to perform indirect call promotion for call sites that are known to target only two functions. The metadata can also be used, for example, to allow call sites to intersect the function attributes of the functions they are known to target (i.e., in CallSiteBase::hasFnAttr).

Although I've implemented this as a separate pass, the functionality could be added to IPSCCP without too much effort. I chose a separate pass primarily as an opportunity to revive the unused generic sparse propagation solver, but this isn't strictly necessary.

If patch sounds reasonable, please take a look at the dependent patches as well. D37354 adds the new kind of metadata, and D37353 enables the generic sparse propagation solver to work interprocedurally similar to IPSCCP. I've added a simple demonstration test to this patch, but I will add more if the overall approach looks good.

Diff Detail

Repository
rL LLVM

Event Timeline

mssimpso created this revision.Aug 31 2017, 2:09 PM
efriedma edited edge metadata.Aug 31 2017, 4:52 PM

I'd like to see a better motivating example; the specific example here could be handled by something more targeted in arg-promotion or something.

What's the difference between overdefined and untracked?

I'd like to see a better motivating example; the specific example here could be handled by something more targeted in arg-promotion or something.

Sure, Eli. Thanks for taking a look. I can add another example or two as test cases, although I'd like to try to not make them too complicated for now. The specific example I included here was pulled from D36432, and may not be the best. I'm currently only propagating through calls/returns, loads/stores through global variables, and phis/selects, which may be all we would want to handle for this kind of task.

What's the difference between overdefined and untracked?

The untracked value is something the generic solver added, and there isn't a big conceptual difference between it and overdefined(at least in this patch). From the comments in the generic solver, it uses untracked to represent "something that is obviously uninteresting to the analysis" in order to "avoid pointless work." Practically, I think this just means that it doesn't maintain mappings for these values in its internal state maps. But yeah, initializing a value as overdefined instead of untracked should end up with the same result. And if this were implemented in IPSCCP, that is what we would do.

davide edited edge metadata.Sep 1 2017, 11:54 AM

As Eli, I'd like to see other examples (I'm confident you can find some). In the meanwhile, a first review.

lib/Transforms/IPO/CalledValuePropagation.cpp
36–38 ↗(On Diff #113465)

Is there a way we can avoid somewhat arbitrary cutoffs? They've been us a lot in the past.

52 ↗(On Diff #113465)

As there's an increasing interest in IPO in llvm (yay) we might consider taking this kind of generic function and move to a common helper (e.g. IPO/Utils). FWIW, GCC has something like that. Also, I'm pretty sure IPSCCP has the same exact function (at least as far as I remember) and in case we find a bug there we need to update a single copy.

58 ↗(On Diff #113465)

I'm unsure about this one. I had a bug lying around as sparse conditional constant propagation has a similar problem (i.e. it ignores variables which address is taken, although I had hard time to understand the concept of address taken for GVs).

108 ↗(On Diff #113465)

is this deterministic?

141 ↗(On Diff #113465)

I'm not familiar with this style of comments, and I don't think it's actually common in LLVM (although I may be wrong).

338 ↗(On Diff #113465)

DenseSet?

344 ↗(On Diff #113465)

Maybe at some point we can get rid of this.

382 ↗(On Diff #113465)

I expected this to preserve something? [probably the same set that IPSCCP preserves]

Hi Davide,

Thanks very much for the initial review! I'm working on adding a few more motivating test cases that demonstrate missed optimization opportunities. I'll upload those shortly.

lib/Transforms/IPO/CalledValuePropagation.cpp
36–38 ↗(On Diff #113465)

The cut-off used here is to prevent the number of lattice values we have to maintain in a std::set from growing too large. As I mentioned in the comments, the number of possible values could technically grow quite large (set of all subsets). I don't think this is likely to occur in practice, though.

One thing we could do to relax any cut-off is to make the lattice values take up less space. We could do this by representing the set of target functions as a bit vector, for example, instead of a vector of function pointers. For this particular task, though, I'm not sure how useful it would be to leave the number of target functions unconstrained.

I envisioned two main uses for this work (although I'm sure we can come up with more): indirect call promotion, and intersecting function attributes at call sites. My thinking was that if we wanted to do indirect call promotion, we probably would want to give up if the call could target a lot of functions (i.e., if we end up having to do more than insert a simple if-else). Intersecting attributes is less clear to me, but I was thinking that the chance that all possible call targets would have interesting attributes (say, a norecurse or readnone) would be small if the set of targets gets very large.

52 ↗(On Diff #113465)

This makes sense to me. Yes, the check for trackable functions and global variables is more-or-less taken from IPSCCP.

58 ↗(On Diff #113465)

I agree. This was PR33143, for reference. I had originally thought we could add a hasAddressTaken to GlobalVariable similar to what we have in Function, but after hearing feedback from Eli, I'm not sure that would help. In this case and in IPSCCP, we're interested in whether we can track the values loaded/stored at a given global variable. This is why we also have to check that the memory operations are not volatile.

108 ↗(On Diff #113465)

It is, but it doesn't look that way because I sort and unique the functions vector before constructing the lattice values (in MergeValues). The functions vector is essentially a sorted set that I tried to make more efficient by using a SmallVector. Also, the functions vector doesn't change after a lattice value object is constructed. Multiple LLVM values will map to the same lattice value object (see also the DenseSet comment below).

141 ↗(On Diff #113465)

It's used in a few places, but I'm happy to document these functions separately. My aim was to not be overly repetitive.

338 ↗(On Diff #113465)

As written, I don't think DenseSet will work, unfortunately. Internally, the generic solver represents lattice values as void pointers (it's old code I guess), so they either have to fit in that amount of space (like the PointerIntPair in SCCP) or they have to index some uniquing data structure. std::set doesn't invalidate iterators after insertion like DenseSet does. That let's us use the address of our custom lattice values as the void pointer in the generic solver. This means that LatticeVals holds the unique lattice value objects, and multiple LLVM values end up mapping to the address of same lattice value object. It's possible there is a better way to interface with the generic solver, though.

344 ↗(On Diff #113465)

I agree. Eli mentioned this as well. Untracked and overdefined are more-or-less the same.

382 ↗(On Diff #113465)

This is what IPSCCP does, actually. But in this case, since we're only adding metadata we could just preserve all. We could even make this an analysis, like TBAA, which also just attaches metadata.

mssimpso updated this revision to Diff 113612.Sep 1 2017, 3:21 PM

Add some more interesting/motivating test cases. These are abstractions of some cases I've come across in benchmarks. They could be optimized if we knew something about the targets of the indirect calls.

mssimpso updated this revision to Diff 113614.Sep 1 2017, 3:28 PM

Fixed a comment in one of the new test cases. Thanks again for taking a look!

mssimpso updated this revision to Diff 114413.Sep 8 2017, 12:27 PM
mssimpso marked 4 inline comments as done.

Addressed comments from Davide and Hal

  • Incorporated feedback from D37354 (changed metadata from !targets to !callees, etc.).
  • Moved common IPO utility functions to shared location (D37638)
  • Separated comment groups into individual comments
  • Marked the pass "preserves all"
mssimpso updated this revision to Diff 114689.Sep 11 2017, 2:13 PM

Renamed "!targets" to "!callees" in a few comments I missed from the previous revision

mssimpso updated this revision to Diff 115443.Sep 15 2017, 11:55 AM
mssimpso marked an inline comment as done.

Made untracked values overdefined.

After looking at the generic solver more closely, I found that when a value is initialized to untracked, it's users aren't notified and added to the work list for processing. This is the intended behavior, but when using untracked, I think it's going to be very easy to make a mistake and not see the entire module. It's probably better for us to track all the values, and mark the uninteresting ones overdefined. This update changes the patch to do this.

Hi everyone,

Are there anymore comments on this patch, or any of its dependencies? Thanks!

dberlin edited edge metadata.Oct 2 2017, 11:18 AM

I see this review has been hanging around a while, so i'll take a stab at reviewing it.

(Random note: I know you didn't do this, but getOrInitValueState is very verbose for a thing that everything is probably going to use by default.
IMHO, the API should be getOrInitValueState is renamed to getValueState, getValueState is renamed to getExistingValueState.
Similarly with the other calls.

Since there are no other users, this may be a good time to change it in a followup)

Generally looks very very good. I don't think i have any algorithmic complaints here at all.
If you agree with the templating vs void ptrs, i'm happy to go commit that patch.

lib/Transforms/IPO/CalledValuePropagation.cpp
203 ↗(On Diff #115443)

Again, not a thing you have to address here, but is there a reason it's not just templated?

IE define an interface lattice vals must meet, and template it relying on that interface to exist.

This is what we do with GraphTraits, for example.

Using void pointers and having to go through these machinations seems .... ugly

(Edit: I have a patch to template it, here: https://reviews.llvm.org/differential/diff/117397/)

206 ↗(On Diff #115443)

DenseSet?

(Do you need the ordering?)

228 ↗(On Diff #115443)

Nitpick: IndirectCalls.insert(I)

Generally I feel like it's getting there. Some small comments, but I expect this to be ready to be committed soon'ish

lib/Transforms/IPO/CalledValuePropagation.cpp
181–190 ↗(On Diff #115443)

you can simplify just doing
return OS <<

286 ↗(On Diff #115443)

DenseSet maybe?

lib/Transforms/IPO/PassManagerBuilder.cpp
717 ↗(On Diff #115443)

I'd make an RFC on llvm-dev as people might have custom pipelines.

And sorry for letting this hanging for so long, but life is a little crazy these days :)

I see this review has been hanging around a while, so i'll take a stab at reviewing it.

(Random note: I know you didn't do this, but getOrInitValueState is very verbose for a thing that everything is probably going to use by default.
IMHO, the API should be getOrInitValueState is renamed to getValueState, getValueState is renamed to getExistingValueState.
Similarly with the other calls.

Since there are no other users, this may be a good time to change it in a followup)

Generally looks very very good. I don't think i have any algorithmic complaints here at all.
If you agree with the templating vs void ptrs, i'm happy to go commit that patch.

Thanks for the review! Yes, templating would be much nicer than void pointers. Feel free to go ahead with that, and I'll update this patch accordingly.

lib/Transforms/IPO/CalledValuePropagation.cpp
203 ↗(On Diff #115443)

Yeah, that would be much nicer - the generic solver is old code. Thanks for working on the patch, and please feel free to go ahead and commit it. You should also update the comment at the top of AbstractLatticeFunction, since it talks about the void pointer. I can rebase my work on top of that.

206 ↗(On Diff #115443)

Sounds good. If we template the lattice function, I don't think std::set will be needed anymore.

Because the values had to be void pointer's, I needed addresses of my custom values in order to interface with the generic solver. DenseSet couldn't be used because the values can move around in memory after they've been added to the set.

228 ↗(On Diff #115443)

Thanks!

Generally I feel like it's getting there. Some small comments, but I expect this to be ready to be committed soon'ish

Thanks very much, Davide!

lib/Transforms/IPO/CalledValuePropagation.cpp
181–190 ↗(On Diff #115443)

Thanks!

286 ↗(On Diff #115443)

Yeah, we can use DenseSet if we template the lattice function and get rid of the void pointers.

lib/Transforms/IPO/PassManagerBuilder.cpp
717 ↗(On Diff #115443)

Good idea.

mssimpso updated this revision to Diff 117737.Oct 4 2017, 2:04 PM
mssimpso marked an inline comment as done.

Addressed comments from Danny and Davide.

  • Regarding the DenseSet, I just removed the set all together. I was previously using the set to unique the lattice values in order to save space. But in the common case, the lattice values should be very small. So I don't think it's worth worrying about that.
  • I also moved the uniquing and sorting of the functions vector from MergeValues to the lattice value constructor. I think this makes it more obvious that the container is set-like. Using a vector instead of set is handy for equality testing, which the lattice value is required to implement.

I'm a bit confused why you use a vector instead of std::set or DenseSet. It looks like either would be better in every way?

You say " Using a vector instead of set is handy for equality testing, which the lattice value is required to implement."

But i honestly don't get it :)

Other than that (and the associated changes to merge, etc. That could just use set_union), this looks reasonable to me.

Also, if you used sets, you could just use set.swap in the constructor (or move constructors) to avoid the extra copying.

mssimpso updated this revision to Diff 118238.Oct 9 2017, 11:17 AM

Addressed Danny's comments.

  • Changed the data structure for holding function pointers from SmallVector to std::set. I was probably over thinking this before, but you're right, a set makes more sense. I went with std::set over DenseSet for a few reasons. First, DenseSet doesn't work with std::inserter, which I used for std::set_union (see below). And second, std::set is ordered, so the functions will always appear in a deterministic order in the !callees metadata, which is useful for testing.
  • Used std::set_union in MergeValues
  • Changed CVPLatticeVal constructor to use the set move constructor to avoid unnecessary copying.
davide accepted this revision.Oct 9 2017, 11:21 AM

This is good enough for me now (I was reviewing the version without std::set while you were updating it, but I think it's just a technical detail and I wouldn't hold off the review just for that).
LGTM but please to wait for Danny to sign off as well. Thank you for your work/patience!

This revision is now accepted and ready to land.Oct 9 2017, 11:21 AM
dberlin accepted this revision.Oct 9 2017, 11:26 AM

thanks for working through this :)

Danny/Davide,

Thanks very much for all of the reviews! Just to clarify, are you also OK with the dependent patches (the changes in D37353 to make the generic solver interprocedural and the NFC in D37638 that pulls out the IPO utilities)?

davide added a comment.Oct 9 2017, 1:30 PM

Yes, I'm fine with the dependent patches.
As this is a large-ish change (and involves a fair amount of refactoring), if I was you, I'd send a mail to llvm-dev to avoid surprises explaining what's your plan (so that if there are people relying on bits being in a particular position for their out-of-tree passes/analyses they at least have an heads-up of what's going on)

Yes, I'm fine with the dependent patches.
As this is a large-ish change (and involves a fair amount of refactoring), if I was you, I'd send a mail to llvm-dev to avoid surprises explaining what's your plan (so that if there are people relying on bits being in a particular position for their out-of-tree passes/analyses they at least have an heads-up of what's going on)

Yes, will do. Thanks, Davide. Danny has some remaining comments on the other patches that I'll take care of in the meantime.

mssimpso updated this revision to Diff 119201.Oct 16 2017, 1:36 PM

I updated this patch following the most recent changes to the generic solver, which allow clients to define custom LatticeKeys. I'll email llvm-dev before committing.

This revision was automatically updated to reflect the committed changes.