Nonnull inbounds GEP cannot be obtained from null pointer. So I decided to mark the GEP operand as non null if this happens. This saves around 0.05%-0.1% of the binary size at Google and 0.06% of binary size of clang. I don't have commit rights. Danila Kutenin. kutdanila@yandex.ru
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Analysis/ValueTracking.cpp | ||
---|---|---|
2134 | Doesn't this cause endless recursion back into isKnownNonNullFromDominatingCondition() |
llvm/lib/Analysis/ValueTracking.cpp | ||
---|---|---|
2134 | isKnownNonZero has Depth cutoff before isKnownNonNullFromDominatingCondition |
llvm/lib/Analysis/ValueTracking.cpp | ||
---|---|---|
2134 | Sure, but before that cutoff triggers, won't we waste time mutually recursing with no progress? |
llvm/lib/Analysis/ValueTracking.cpp | ||
---|---|---|
2134 | Not to compute the same values we may not go recursively and only look for constants, I believe this is the most popular case |
llvm/lib/Analysis/ValueTracking.cpp | ||
---|---|---|
2134 | Done. |
I'm somewhat concerned about the general direction here. This looks through a GEP, but what about bitcasts? What about a bitcast of a GEP? A GEP of a bitcast? A longer chain of GEPs and bitcasts? We can't reasonably walk the whole use graph and this is really pushing the bounds of what it appropriate for a ValueTracking helper.
We have a more principled version of this optimization in LVI, which scans whole blocks for pointer dereferences and records their underlying objects, thus handling this in full generality. The only caveat is that LVI only uses this information to optimize the terminator instruction, because it does not store where exactly inside the block the first dereference occurs. If the motivation here is handling of non-terminator comparisons, then that might be a better avenue to explore?
llvm/lib/Analysis/ValueTracking.cpp | ||
---|---|---|
2132 | How can a GEP not have pointer type? | |
2134 | I don't get it. What is this isGEPKnownNonNull() check here for? Why do the GEP indices matter at all? | |
2135 | This effectively raises the maximum uses walked from DomConditionsMaxUses to DomConditionsMaxUses^2, because you can walk that many GEPs, and then that many uses of each GEP. Shouldn't this be counting towards the main NumUsesExplored, rather than a separate limit? |
I was looking at LVI and haven't come up with anything meaningful, probably because I am not familiar with that part.
I was following more a statistical approach at Google -- what the users do more with pointers, it turned out that dereferencing + check and pointer arithmetic+deref+check are the most common ones by far. That said, I found that deref+check is already optimized in ValueTracking and I decided to do the same with pointer arithmetic, however, yes, it starts to be a little bloaty. Also GCC does these kind of things with C/C++ code consistently and llvm is far behind in tracking the (non)nullness of pointers. 0.1% in big binaries looks like a big low hanging win.
Currently I don't want to go to LVI if I have a choice.
I basically agree with @nikic here.
There is a huge number of low-hanging fruit like this,
but the approach just doesn't seem viable to me.
I would instead like to see the Attributor finally enabled,
there such things would be much more straight-forward.
Sorry for a very not helpful feedback.
How can a GEP not have pointer type?