This is an archive of the discontinued LLVM Phabricator instance.

Introduce GlobalSplit pass.
ClosedPublic

Authored by pcc on Jul 12 2016, 7:46 PM.

Details

Summary

This pass splits globals into elements using inbounds annotations on
getelementptr indices.

Depends on D22793

Diff Detail

Event Timeline

pcc updated this revision to Diff 63773.Jul 12 2016, 7:46 PM
pcc retitled this revision from to Introduce !splitpoint metadata and GlobalSplit pass..
pcc updated this object.
pcc added a reviewer: eugenis.
pcc added subscribers: llvm-commits, krasin.
hfinkel added inline comments.
docs/LangRef.rst
4904 ↗(On Diff #63773)

I find the specification of the semantics somewhat unclear. What I think this is trying to say is that the splitpoint metadata modifies the definition of 'inbounds' on GEPs, so that instead after:

The in bounds addresses for an allocated object are all the addresses that point into the object, plus the address one byte past the end.

We add something like this:

If the object is a global partitioned with one or more !splitpoint metadata, then in bounds addresses are all of the addresses that point into the same partition of the object as the base address. The address one byte past the end of the object is part of the last partition.

Maybe this is only true if the first GEP index value is a constant?

Alternatively, we might specify that the optimizer is free to add an arbitrary amount of padding at the split points. I'm not sure this is strong enough because the later partitions might actually end up with addresses before the first partition.

pcc added inline comments.Jul 13 2016, 3:17 PM
docs/LangRef.rst
4904 ↗(On Diff #63773)

Is that enough, though? It would also be possible to calculate an offseted pointer with 2's complement arithmetic with a non-inbounds getelementptr, or with ptrtoint/add/inttoptr. In both of those cases, per the as-if rule, I'd expect those calculations to produce addresses within the global variable when crossing a split point without language to the contrary.

What I was thinking was that we provide a definition of "object" such that "splitpoint" metadata would allow a global variable to be represented using multiple objects.

pcc updated this revision to Diff 63885.Jul 13 2016, 4:55 PM
  • Improve docs
hfinkel added inline comments.Jul 13 2016, 8:23 PM
docs/LangRef.rst
4918 ↗(On Diff #63885)

Fair enough; I'm not sure how this is different in practice.

The semantics you've implemented seem extremely dangerous. You're making some very strong assumptions about how exactly the optimizer manipulates GEPs. I'm not sure whether the optimizer actually breaks this particular construct, but it easily could. The optimizer can and will dig through a ConstantExpr and transform the use into something different.

You could even run into issues with the optimizer manipulating non-ConstantExpr GEPs: for example, the optimizer could transform a series of virtual calls into a loop, then index that loop by the vtable pointer; at that point, splitting the global using the obvious algorithm makes the end pointer point at the wrong global. (I don't think we do this particular kind of loop re-rolling at the moment, and LSR probably runs after the global splitting pass, but the point is that there are side-effects which have nothing to do with globals.)


If you want GEPs with special rules, you should attach metadata to the GEPs or something like that. A stronger version of inbounds seem more generally useful, anyway; you could use it for other things like alias analysis.

pcc added a comment.Jul 14 2016, 11:05 AM

Thanks Eli. I had similar concerns, but they seemed largely theoretical, as I was unable to come up with a good example of a optimization pass that could reasonably break this, but your example of a loop re-roller seems at least plausible.

I will think about whether there's some reasonable IR extension we can make here. I'm thinking either a constexpr extension, or maybe something with aliases representing the individual objects.

pcc added a comment.Jul 15 2016, 12:20 PM

Here's the extension I had in mind: allow the "inbounds" keyword to appear before any index operand of a getelementptr constant. That keyword causes any derived pointer outside of the bounds of the element referred to by that index, other than the pointer one past the end of the pointer, to be treated as a poison value. For example:

i8** getelementptr inbounds ({[4 x i8*], [4 x i8*]}, {[4 x i8*], [4 x i8*]}* @_ZTVfoo, i32 0, inbounds i32 1, i32 2)

This is a reference to the address point of the second element of _ZTVfoo, a virtual table group with two virtual tables, such that any pointer extending beyond the bounds of the second virtual table is a poison value.

Please let me know what you think.

pcc updated this revision to Diff 65455.Jul 25 2016, 5:50 PM

Use new IR extension: D22793

pcc retitled this revision from Introduce !splitpoint metadata and GlobalSplit pass. to Introduce GlobalSplit pass..Jul 25 2016, 5:52 PM
pcc updated this object.
eugenis added inline comments.Jul 26 2016, 1:35 PM
lib/Transforms/IPO/GlobalSplit.cpp
56

Btw would you like to add a paragraph on !type metadata to http://llvm.org/docs/LangRef.html ?

66

maybe give the new global a human-readable IR name

111

avoid creating the new undef value if there are no remaining uses

eli.friedman added inline comments.Jul 26 2016, 1:39 PM
lib/Transforms/IPO/GlobalSplit.cpp
52

Don't you also need to check the users of the GEP? The result of getInBoundsIndex() only has a semantic effect for loads and stores.

pcc updated this revision to Diff 65618.Jul 26 2016, 3:46 PM
pcc marked 3 inline comments as done.
  • Refresh, address review comments
lib/Transforms/IPO/GlobalSplit.cpp
52

If each user of a global is an inrange getelementptr constant (and nothing else can take its address, i.e. it has local linkage), it follows that any loads from or stores to that global must use a pointer derived from an inrange getelementptr constant, which is sufficient to allow us to apply the splitting transform.

Added a comment documenting this.

56

r276817

eli.friedman added inline comments.Jul 26 2016, 3:51 PM
lib/Transforms/IPO/GlobalSplit.cpp
52

My point was that loads and stores aren't the only way to use a pointer; what about pointer value comparisons?

pcc added inline comments.Jul 26 2016, 4:09 PM
lib/Transforms/IPO/GlobalSplit.cpp
52

Ah, yes. I think we can address that with language in the langref specifying that the result of a comparison is undefined if either operand was derived from an inrange GEP, with the exception of comparisons where both operands were derived from a GEP which was evaluated with identical operands up to and including the inrange operand. I'll update my other patch.

pcc updated this revision to Diff 65652.Jul 26 2016, 7:31 PM
  • Require at least local_unnamed_addr on globals to be split

I don't see how unnamed_addr is relevant here. It means that the code won't compare the address of the global in question to the address of a different global. It doesn't have anything to do with comparisons involving different addresses within the same global.

I don't see how unnamed_addr is relevant here. It means that the code won't compare the address of the global in question to the address of a different global. It doesn't have anything to do with comparisons involving different addresses within the same global.

I thought about that too, but isn't a split global just like a collection of several globals?

I'll try to outline a scenario where this misbehaves...

Suppose you have an unnamed_addr global, call it "const int a[2][5]", and a loop which sums up all the elements of a[0]. You can express the sum something like this: int* p = a[0]; while (p != a[1]) { sum += *p; ++p; }. Both a[0] and a[1] can be marked inrange: a[0] because all the loads are in range, a[1] because there aren't any memory operations over it. The loop explodes for obvious reasons if you split a[1] into a separate global.

The obvious way to make this illegal is to make marking a[1] inrange illegal (the "comparisons are undefined" rule). There isn't any marking you can put on the global which will help because the optimizer can insert the pointer comparison without knowing a global is even involved.

pcc added a comment.Sep 7 2016, 11:40 AM

Thanks Eli. Apologies for the delayed response, I just returned from a long vacation.

Having given this a little more thought, the "comparisons are undefined" rule, while in my opinion perhaps too conservative, does seem like a reasonable starting point. I'll update this patch and the other one.

pcc updated this revision to Diff 70626.Sep 7 2016, 4:57 PM
  • Revert "Require at least local_unnamed_addr on globals to be split"
pcc updated this revision to Diff 75492.Oct 21 2016, 2:51 PM

Refresh

pcc added a comment.Nov 10 2016, 2:45 PM

This is unblocked now.

eugenis edited edge metadata.Nov 10 2016, 4:10 PM

What happens to !dbg metadata when a global is split?

llvm/lib/Transforms/IPO/GlobalSplit.cpp
82 ↗(On Diff #75492)

This is N^2.
N is probably very small in practice, but I guess there can be outliers. Consider switching to an NlogN implementation?

103 ↗(On Diff #75492)

SmallVector?

pcc updated this revision to Diff 78265.Nov 16 2016, 2:35 PM
pcc marked an inline comment as done.
pcc edited edge metadata.
  • Use SmallVector
pcc added a comment.Nov 16 2016, 2:37 PM

What happens to !dbg metadata when a global is split?

It gets discarded. This isn't important for the vtable use case (vtables don't normally have debug info of their own). I think we may be able to represent this using DW_OP_piece, but that can be done separately.

llvm/lib/Transforms/IPO/GlobalSplit.cpp
82 ↗(On Diff #75492)

On my machine this pass takes 0.4 seconds to run during an LTO link of chrome. So I don't think it's worth making major changes for performance here.

eugenis accepted this revision.Nov 16 2016, 3:30 PM
eugenis edited edge metadata.

LGTM
Please add a FIXME to preserve/update debug metadata.

llvm/lib/Transforms/IPO/GlobalSplit.cpp
82 ↗(On Diff #75492)

OK, that's reasonable.

This revision is now accepted and ready to land.Nov 16 2016, 3:30 PM
This revision was automatically updated to reflect the committed changes.