This is an archive of the discontinued LLVM Phabricator instance.

Reduce dropTriviallyDeadConstantArrays cumulative time percentage from 17% to 4%
ClosedPublic

Authored by stephan.yichao.zhao on Aug 5 2020, 5:11 PM.

Details

Summary

The history of dropTriviallyDeadConstantArrays is like this. Because the appending linkage uses too much memory (http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20150105/251381.html), dropTriviallyDeadConstantArrays was introduced (https://reviews.llvm.org/rG81f385b0c6ea37dd7195a65be162c75bbdef29d2) to release unused constant arrays. Recently, dropTriviallyDeadConstantArrays was improved (https://reviews.llvm.org/rG81f385b0c6ea37dd7195a65be162c75bbdef29d2) to reduce its quadratic cost.

Our recent LTO profiling shows that when a target is large, 15-20% of time cost is from the SetVector::insert called by dropTriviallyDeadConstantArrays.

A large application has hundreds or thousands of modules; each module calls dropTriviallyDeadConstantArrays once for cleaning up tens of thousands of ConstantArrays a module has. In those ConstantArrays, usually around 5 can be deleted; a very very few deleted ConstantArrays reference other ConstantArrays: less than 10 out of millions.

Given this, the cost of SetVector::insert is mainly from the construction of WorkList from ArrayConstants. This motivated the fix that iterates ArrayConstants directly, and uses WorkList only when necessary.

Our evaluation shows that

  1. The cumulative time percentage of dropTriviallyDeadConstantArrays is reduced from 15-17% to 4-6%.
  2. For targets with LTO time > 20min, the time reduction is about 20%.
  3. No observable performance impact for build without using LTO.


Diff Detail

Event Timeline

stephan.yichao.zhao requested review of this revision.Aug 5 2020, 5:11 PM
mehdi_amini added inline comments.Aug 5 2020, 8:18 PM
llvm/lib/IR/LLVMContextImpl.cpp
159

So can you use a for-range loop?

161

It actually isn't clear to why you need the Deleted set at all.

Can't we just use this loop to prime the worklist with the entries that aren't used and then process the worklist below?

for (ConstantArray *Constant : ArrayConstants) {
  if (C->use_empty())
    WorkList.insert.COp);
}

while (!WorkList.empty()) {
  // unchanged...
jdoerfert added a subscriber: jdoerfert.EditedAug 5 2020, 8:49 PM

What is "cl/324220676" and how do I see it?

llvm/lib/IR/LLVMContextImpl.cpp
149

Please proof read this comment.

ychen added a subscriber: ychen.Aug 5 2020, 10:17 PM

What is time cost ratio?

llvm/lib/IR/LLVMContextImpl.cpp
149

I'd go with "When ArrayConstants are of substantial size and only a few in them is dead, starting WorkList with all elements of ArrayConstants can be wasteful. Instead, starting WorkList with only elements that have empty uses."

addressed comments.

stephan.yichao.zhao marked 4 inline comments as done.

reverted unrelated change

stephan.yichao.zhao edited the summary of this revision. (Show Details)Aug 6 2020, 12:30 AM
stephan.yichao.zhao edited the summary of this revision. (Show Details)Aug 6 2020, 12:35 AM
stephan.yichao.zhao edited the summary of this revision. (Show Details)

What is time cost ratio?

updated the description.

llvm/lib/IR/LLVMContextImpl.cpp
149

Rewrote comments after simplifying the code.

Also removed cl/324220676 from the summary. It is not related. Uploaded some pprof results.

161

Thank you. Removed "Deleted" and applied your suggested code.

stephan.yichao.zhao retitled this revision from Improve dropTriviallyDeadConstantArrays time cost ratio from 17% to 4% to Improve dropTriviallyDeadConstantArrays cumulative time percentage from 17% to 4%.Aug 6 2020, 12:43 AM
stephan.yichao.zhao edited the summary of this revision. (Show Details)

answered comments.

What is "cl/324220676" and how do I see it?

Rewrote comments after simplifying the code.

Also removed cl/324220676 from the summary. It is not related. Uploaded some pprof results.

stephan.yichao.zhao retitled this revision from Improve dropTriviallyDeadConstantArrays cumulative time percentage from 17% to 4% to Reduce dropTriviallyDeadConstantArrays cumulative time percentage from 17% to 4%.Aug 6 2020, 12:48 AM
mehdi_amini accepted this revision.Aug 6 2020, 12:58 AM
This revision is now accepted and ready to land.Aug 6 2020, 12:58 AM
jdoerfert accepted this revision.Aug 6 2020, 8:53 AM

LGTM. Nice and small :) Thx!

tejohnson added inline comments.Aug 6 2020, 10:22 AM
llvm/lib/IR/LLVMContextImpl.cpp
147

Would it make sense to also guard insertion here with COp->use_empty()? You can then remove the C->use_empty() check earlier in this loop, which is redundant for ConstantArrays inserted by the new loop you added above.

llvm/lib/IR/LLVMContextImpl.cpp
147

{

}

In the snapshot,

  1. an arrow from C1 to C4 means C4 is C1's operand. It seems the 'use' in LLVM means 'reference'. So if an element appears at any side of an arrow, it is 'used'. Please fix me if my assumption is wrong.
  2. C{i} is ConstantArray. X and Y are other constant like ConstantStruct, ConstantVector, etc

if we move the use_empty before insert, all C{i} can still be removed the same way. For example, although C4 is not inserted after C1 is deleted before C2 is deleted, after C2 is deleted, C4 can still be inserted.

The question is if the case of C5 exists. Before C3 is deleted and after C2 is deleted, C5 is still used, so not put into WorkList. When C3 is being deleting, Constant::destroyConstant also tries to delete X. After X is deleted, C5 is not used. We are not able to remove C5 since dropTriviallyDeadConstantArrays does not know this.

I am not sure if this can happen. From the original problem the function to address, such a case, if exists, is not related. Also given our test, in a module with 50k constant arrays, less than 10 operands could be inserted. So it does not change memory reduction rate too much if we move use_empty check close to insert.

The current code change ensures it removes the same number of ConstantArrays as the previous approach with less time overhead. Maybe we can leave whether insert only when use_empty as the next potential CL with more profiling?

tejohnson accepted this revision.Aug 6 2020, 12:19 PM

lgtm

llvm/lib/IR/LLVMContextImpl.cpp
147

When C3 is being deleting, Constant::destroyConstant also tries to delete X. After X is deleted, C5 is not used

But could C3 be deleted here if it is still used by X?

However, I see the flaw in what I suggested - presumably all of C's operands are still used before C is destroyed at the bottom of this loop. So none would have use_empty() yet, and so we need to conservatively add them to the worklist in case C is the only use (in which case after C is destroyed they will then have their uses empty). I suppose it could check if there is a single use, and only add it then, but it sounds like it might not be worth it.

This revision was landed with ongoing or failed builds.Aug 7 2020, 11:37 AM
This revision was automatically updated to reflect the committed changes.