This is an archive of the discontinued LLVM Phabricator instance.

[GlobalOpt] Improve common case efficiency of static global initializer evaluation
ClosedPublic

Authored by aemerson on Jan 26 2018, 7:10 PM.

Details

Summary

For very, very large global initializers which can be statically evaluated, the existing code would create vectors of temporary Constants, modifying them in place, before committing the resulting Constant aggregate to the global's initializer value. This had effectively O(n^2) complexity in the size of the global initializer and would cause memory and non-termination issues compiling some workloads.

This change performs the static initializer evaluation and creation in batches, once for each global in the evaluated IR memory. The existing code is maintained as a last resort when the initializers are more complex than simple values in a large aggregate. This should theoretically by NFC, no test as the example case is massive. The existing test cases pass with this, as well as the llvm test suite.

Diff Detail

Repository
rL LLVM

Event Timeline

aemerson created this revision.Jan 26 2018, 7:10 PM
aprantl added inline comments.
lib/Transforms/IPO/GlobalOpt.cpp
2255

Could you please add a Doxygen comment explaining what this function does?

aemerson updated this revision to Diff 131827.Jan 29 2018, 10:37 AM

Updated with some more comments.

Thank you drilling into this! I have a few questions below. Also, could you comment on the time savings you measured for your implementation?

-Gerolf

lib/Transforms/IPO/GlobalOpt.cpp
2280

Wouldn't it be better to have 3 vectors - GVs, SimpleCEs and ComplexCEs?

2308

The logic is fishy. When CurrentGV is the nullptr ForceCommit is still pushing through (or hit the assert, of course, when it is present). What if there is only one element to initialize? Could this result in a null ptr dereference?

2337

"CI" would be more consistent with "GV" than "CU"

2342

Isn't LastPair == CurrentGV?

Thank you drilling into this! I have a few questions below. Also, could you comment on the time savings you measured for your implementation?

-Gerolf

With this change, the test case I was using completed compiling in about 45 seconds, a significant portion of which was spent in the front-end/elsewhere in the compiler.

I don't have a baseline to compare against since I ran out of patience after 10 minutes or so. However if my math is right, for the 127k element initializer, we were doing a copy of the entire array for each element of the initializer. Bringing it down to linear time from O(n^2) I estimate it would have taken at least 66 days to complete, probably more.

lib/Transforms/IPO/GlobalOpt.cpp
2280

Yes that's plausible.

2308

I don't think so. If there's one element only then CurrentGV will be pointing to that element, but the cache will need committing. At that point, CurrentGV == GV and ForceCommit == true.

2337

Sure.

2342

Yes I think you're right.

aemerson updated this revision to Diff 132097.Jan 30 2018, 8:39 PM

Addressed feedback.

This is very close now. Could you add an explicit examples (eg show the IR) showing which initialization remain slow (Complex) and which are fast now? This should also address the spirit of Adrian's question I think.

Thanks
Gerolf

lib/Transforms/IPO/GlobalOpt.cpp
2301

I admit it still hurts my eye to see back to back checks for CurrentGV and ForceCommit. How about this: I think in your final instance of the lambda it can be guaranteed that CurrentGV != null. So there could be a bool parameter 'bool Update'(instead of ForceCommit) and pass GV !=CurrentGV to Update, while on the final call Update is passed true.

This is very close now. Could you add an explicit examples (eg show the IR) showing which initialization remain slow (Complex) and which are fast now? This should also address the spirit of Adrian's question I think.

Thanks
Gerolf

Sure I'll a detailed comment, this is a tricky thing to get right, and testing is awkward because we rely on LLVM statically evaluating the initializers in order to trigger this code.

lib/Transforms/IPO/GlobalOpt.cpp
2301

Fair point. I'll make that change.

aemerson updated this revision to Diff 132220.Jan 31 2018, 11:12 AM

Simplified the logic a bit and added an example in the function documentation of complex and simple addresses.

I assume you will take care of the comment. LGTM.

Very nice work. Thanks!

-Gerolf

lib/Transforms/IPO/GlobalOpt.cpp
2339

Update && CurrentGV?

aemerson added inline comments.Jan 31 2018, 2:30 PM
lib/Transforms/IPO/GlobalOpt.cpp
2339

I've simplified it to this since there's 4 possible cases:

  1. First iteration: Update is true (GV != CurrentGV) and CurrentGV == nullptr.
  2. Same GV as existing cache: Update is false (GV == CurrentGV), CurrentGV != nullptr;
  3. New GV so committing cache: Update is true (GV != CurrentGV) and CurrentGV != nullptr.
  4. Last iteration: Update forced to true, CurrentGV != nullptr.

Because of case 1 we shouldn't check for && CurrentGV

This revision was not accepted when it landed; it landed in state Needs Review.Jan 31 2018, 4:00 PM
This revision was automatically updated to reflect the committed changes.