This pass splits globals into elements using inbounds annotations on
getelementptr indices.
Depends on D22793
Differential D22295
Introduce GlobalSplit pass. pcc on Jul 12 2016, 7:46 PM. Authored by
Details This pass splits globals into elements using inbounds annotations on Depends on D22793
Diff Detail
Event Timeline
Comment Actions The semantics you've implemented seem extremely dangerous. You're making some very strong assumptions about how exactly the optimizer manipulates GEPs. I'm not sure whether the optimizer actually breaks this particular construct, but it easily could. The optimizer can and will dig through a ConstantExpr and transform the use into something different. You could even run into issues with the optimizer manipulating non-ConstantExpr GEPs: for example, the optimizer could transform a series of virtual calls into a loop, then index that loop by the vtable pointer; at that point, splitting the global using the obvious algorithm makes the end pointer point at the wrong global. (I don't think we do this particular kind of loop re-rolling at the moment, and LSR probably runs after the global splitting pass, but the point is that there are side-effects which have nothing to do with globals.) If you want GEPs with special rules, you should attach metadata to the GEPs or something like that. A stronger version of inbounds seem more generally useful, anyway; you could use it for other things like alias analysis. Comment Actions Thanks Eli. I had similar concerns, but they seemed largely theoretical, as I was unable to come up with a good example of a optimization pass that could reasonably break this, but your example of a loop re-roller seems at least plausible. I will think about whether there's some reasonable IR extension we can make here. I'm thinking either a constexpr extension, or maybe something with aliases representing the individual objects. Comment Actions Here's the extension I had in mind: allow the "inbounds" keyword to appear before any index operand of a getelementptr constant. That keyword causes any derived pointer outside of the bounds of the element referred to by that index, other than the pointer one past the end of the pointer, to be treated as a poison value. For example: i8** getelementptr inbounds ({[4 x i8*], [4 x i8*]}, {[4 x i8*], [4 x i8*]}* @_ZTVfoo, i32 0, inbounds i32 1, i32 2) This is a reference to the address point of the second element of _ZTVfoo, a virtual table group with two virtual tables, such that any pointer extending beyond the bounds of the second virtual table is a poison value. Please let me know what you think.
Comment Actions
Comment Actions I don't see how unnamed_addr is relevant here. It means that the code won't compare the address of the global in question to the address of a different global. It doesn't have anything to do with comparisons involving different addresses within the same global. Comment Actions I thought about that too, but isn't a split global just like a collection of several globals? Comment Actions I'll try to outline a scenario where this misbehaves... Suppose you have an unnamed_addr global, call it "const int a[2][5]", and a loop which sums up all the elements of a[0]. You can express the sum something like this: int* p = a[0]; while (p != a[1]) { sum += *p; ++p; }. Both a[0] and a[1] can be marked inrange: a[0] because all the loads are in range, a[1] because there aren't any memory operations over it. The loop explodes for obvious reasons if you split a[1] into a separate global. The obvious way to make this illegal is to make marking a[1] inrange illegal (the "comparisons are undefined" rule). There isn't any marking you can put on the global which will help because the optimizer can insert the pointer comparison without knowing a global is even involved. Comment Actions Thanks Eli. Apologies for the delayed response, I just returned from a long vacation. Having given this a little more thought, the "comparisons are undefined" rule, while in my opinion perhaps too conservative, does seem like a reasonable starting point. I'll update this patch and the other one. Comment Actions It gets discarded. This isn't important for the vtable use case (vtables don't normally have debug info of their own). I think we may be able to represent this using DW_OP_piece, but that can be done separately.
Comment Actions LGTM
|
This is N^2.
N is probably very small in practice, but I guess there can be outliers. Consider switching to an NlogN implementation?