This is an archive of the discontinued LLVM Phabricator instance.

ValueTracking: Put DataLayout reference into the Query structure, NFC.
ClosedPublic

Authored by MatzeB on Jan 14 2016, 3:46 PM.

Details

Summary

After r251146 computeKnownBits() is called multiple times for every instruction in the program, which resulted in a 3% compiletime regression. This patch tries to get some of that compiletime back by optimizing the function:

Put DataLayout reference into the Query structure.
It looks nicer and improves the compiletime of a typical
clang -O3 -emit-llvm run by ~0.6% for me.

Diff Detail

Repository
rL LLVM

Event Timeline

MatzeB updated this revision to Diff 44940.Jan 14 2016, 3:46 PM
MatzeB retitled this revision from to ValueTracking: Put DataLayout reference into the Query structure, NFC..
MatzeB updated this object.
MatzeB added reviewers: hfinkel, majnemer, sanjoy.
MatzeB set the repository for this revision to rL LLVM.
MatzeB added a subscriber: llvm-commits.

IIUC, you just save a pointer argument to computeKnownBits right? Is it the only reason for the speedup? Pretty surprising to me.

lib/Analysis/ValueTracking.cpp
461 ↗(On Diff #44940)

The changes in this function seems unrelated to the DataLayout, it is a cleanup that you could commit separately I feel.

In D16205#327411, @joker.eph wrote:

IIUC, you just save a pointer argument to computeKnownBits right? Is it the only reason for the speedup? Pretty surprising to me.

Yes it's "just" the pointer argument, but the code here is called often and recursively so this can add up to smaller stack size and reduced reloading/spilling.

lib/Analysis/ValueTracking.cpp
461 ↗(On Diff #44940)

This is necessary because llvm::isValidAssumeForContext() constructed an ad-hoc Query to transmit these 2 values, however we do not have a DataLayout at that point and actually don't need it here anyway.

mehdi_amini accepted this revision.Jan 14 2016, 4:04 PM
mehdi_amini added a reviewer: mehdi_amini.

LGTM.
(I added the DataLayout argument to computeKnownBits last year, I'm fine with adding it to the query)

This revision is now accepted and ready to land.Jan 14 2016, 4:04 PM
reames added a subscriber: reames.Jan 14 2016, 4:35 PM

I have no objection to the patch, but I am curious to know why you see a
speedup here. I'd really expect a combination of inlining and register
allocation tricks* to cut the cost of the argument passing to near
zero. If it isn't, that seems like something we should fix in the
compiler. (I'm assuming you're compiling with a recent Clang here?)

  • In many of these cases, the argument register doesn't need to be

preserved over the inner call because a) it's not a callee saved
register and b) it's not used after the return.

Have you looked at the resulting assembly to see where the time is spent?

Philip

I have no objection to the patch, but I am curious to know why you see a
speedup here. I'd really expect a combination of inlining and register
allocation tricks* to cut the cost of the argument passing to near
zero. If it isn't, that seems like something we should fix in the
compiler. (I'm assuming you're compiling with a recent Clang here?)

  • In many of these cases, the argument register doesn't need to be

preserved over the inner call because a) it's not a callee saved
register and b) it's not used after the return.

Have you looked at the resulting assembly to see where the time is spent?

I looked at it a bit in valgrind/callgrind. There are indeed 0.7% less instructions executed overall.
Looking at the assembly the DataLayout parameter is spilled at the beginning of each function and reloaded in front of nearly all recursive calls. According to valgrind this saves 2-3% of instructions in the various computeKnownBits functions and surprisingly they are high enough in the profile that this amounts to 0.7% instructions saved overall.

This revision was automatically updated to reflect the committed changes.