Page MenuHomePhabricator

Improve merging of stores from static constructors in GlobalOpt

Authored by inolen on Jul 14 2015, 2:48 PM.



While working on a project I wound up generating a fairly large lookup table (10k entries) of callbacks inside of a static constructor. Clang was taking upwards of ~10 minutes to compile the lookup table. I generated a smaller test case ( that, after running with -ftime-report, pointed fingers at GlobalOpt and MemCpyOptimizer.

Running globalopt took around ~9 minutes. The slowdown came from how GlobalOpt merged stores from static constructors individually into the global initializer in EvaluateStaticConstructor. For each store it discovered and wanted to commit, it would copy the existing global initializer and then merge in the individual store. I changed this so that stores are now grouped by global, and sorted from most significant to least significant by their GEP indexes (e.g. a store to GEP 0, 0 comes before GEP 0, 0, 1). With this representation, the existing initializer can be copied and all new stores merged into it in a single pass.

With this patch and, the lookup table that was taking ~10 minutes to compile now compiles in around 5 seconds. I've ran 'make check' and the test-suite, which all passed.

I'm not really sure who to tag as a reviewer, Lang mentioned that Chandler may be appropriate.

Diff Detail


Event Timeline

inolen updated this revision to Diff 29714.Jul 14 2015, 2:48 PM
inolen retitled this revision from to Improve merging of stores from static constructors in GlobalOpt.
inolen updated this object.
inolen added a reviewer: chandlerc.
inolen set the repository for this revision to rL LLVM.
inolen added a subscriber: llvm-commits.
nlewycky added inline comments.
1968 ↗(On Diff #29714)

Typo, "least signifcant" -> "least significant".

1968 ↗(On Diff #29714)

You don't define what significant means here? It appears to mean lowest start offset first then largest range?

1969–1970 ↗(On Diff #29714)

Is the map allowed to contain both? What about overlapping GEP expressions (you can achieve this by having one GEP with more indices than another)?

2015–2020 ↗(On Diff #29714)

This will assert on the fully generalized possibilities of an llvm::Constant (for instance a ConstantExpr bitcast of an llvm::Function).

That couldn't happen when isSimpleEnoughPointerToCommit(Addr) is true, but I don't see any reason why, by construction, that is guaranteed to happen?

2038 ↗(On Diff #29714)

Please include a short message explaining why this invariant exists. For example:

assert(R.second && "Adding store to a global that isn't mutated?");
2212–2213 ↗(On Diff #29714)

What if the address is a bitcast? If Addr is a bitcast of i32* @GV to i8*, then it would only clobber the first byte of the initializer.

2237–2238 ↗(On Diff #29714)

This is a functionality change (unless it's dead code). Could you please add a test for it? Better, can it be hoisted out of this change and submitted in advance?

2242 ↗(On Diff #29714)


llvm_unreachable("Unexpected initializer type");

here. See .

2289 ↗(On Diff #29714)

Also that MG.Stores.empty() ?

inolen updated this revision to Diff 30169.Jul 20 2015, 10:23 AM

Round 2 of the patch here.

I've stopped mixing stores to the global and stores to a GEP of a global in the map. It made sorting and merging more confusing than it needed to be. I believe this should address the comments around the poor wording.

The map now stores instances of GEPOperator, no more casting Constants in the merge function.

GetGlobalForPointer has been updated to match the logic inside of isSimpleEnoughPointerToCommit.

Regarding the comment on a change in functionality, I don't believe there is any there. EvaluateStoreInto appears to have the same logic for Array / Vector types copied inside of the Structure type block.

Added llvm_unreachable in places instead of assert(false).

This looks nearly ready to land. I've convinced myself that "MergePendingStores" is doing the right thing.

1974 ↗(On Diff #30169)


assert(cast<ConstantInt>(A->getOperand(1))->isZero() && "GEP A steps over object");

and another for B? Something to make it clear why int i starts at 2.

2003 ↗(On Diff #30169)

80 columns! I heartily recommend clang-format.

2011–2012 ↗(On Diff #30169)

Couldn't CE->getOpcode() == Instruction::BitCast also reach here?

2044 ↗(On Diff #30169)


2116 ↗(On Diff #30169)
2187 ↗(On Diff #30169)


Thanks again for the review!

2011–2012 ↗(On Diff #30169)

As far as I can tell, it shouldn't.

If the store pointer is a bitcast (and it's not folded away), it gets through to the block we were discussing in IRC. If the logic in that block is successful, the pointer is converted into a GEP expression before being added to the map:

inolen updated this revision to Diff 30219.Jul 20 2015, 5:24 PM

Added the mentioned asserts, fixed typos and ran clang-format on the new code.

nlewycky accepted this revision.Jul 20 2015, 7:50 PM
nlewycky added a reviewer: nlewycky.


2011–2012 ↗(On Diff #30219)


This revision is now accepted and ready to land.Jul 20 2015, 7:50 PM
This revision was automatically updated to reflect the committed changes.