This is an archive of the discontinued LLVM Phabricator instance.

[BranchProbabilityInfo] Use SmallVector (NFC)
ClosedPublic

Authored by kazu on Nov 7 2020, 3:11 PM.

Details

Summary

This patch simplifies BranchProbabilityInfo by changing the type of
Probs.

Without this patch:

DenseMap<Edge, BranchProbability> Probs

maps an ordered pair of a BasicBlock* and a successor index to an edge
probability.

With this patch:

DenseMap<const BasicBlock *, SmallVector<BranchProbability, 2>> Probs

maps a BasicBlock* to a vector of edge probabilities.

BranchProbabilityInfo has a property that for a given basic block, we
either have edge probabilities for all successors or do not have any
edge probability at all. This property combined with the current map
type leads to a somewhat complicated algorithm in eraseBlock to erase
map entries one by one while increasing the successor index.

The new map type allows us to remove the all edge probabilities for a
given basic block in a more intuitive manner, namely:

Probs.erase(BB);

Diff Detail

Event Timeline

kazu created this revision.Nov 7 2020, 3:11 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 7 2020, 3:11 PM
kazu requested review of this revision.Nov 7 2020, 3:11 PM

It is true that SmallVector is more convenient to use but this change also has potential to increase memory usage. Have you checked whether this can increase memory usage?

kazu added a comment.Nov 7 2020, 6:25 PM

It is true that SmallVector is more convenient to use but this change also has potential to increase memory usage. Have you checked whether this can increase memory usage?

The differences in memory consumption seem to be really small.

I have invoked clang for three input files, three times each, and looked at Maximum resident set size (kbytes) with /usr/bin/time -v.

JumpThreading.ii
Without this patch: 280792 281016 281392
With this patch:    280736 281484 281568

SROA.ii
Without this patch: 311820 312412 312488
With this patch:    311224 311356 311468

LICM.ii
Without this patch: 346300 346380 348356
With this patch:    345528 348084 348564

Under the hood, the difference should be a wash because:

DenseMap<std::pair<const BasicBlock *, unsigned>, BranchProbability>::value_type

is 24 bytes in size, padded to 32 bytes, and

DenseMap<BasicBlock *, SmallVector<BranchProbability, 2>>::value_type

is 32 bytes in size.

If we assume that we have a function with 100 "if" statements with no switch statements at 50% occupancy of the hash table, we have:

Without this patch: 32 bytes/entry * 200 edges        / 50% occupancy = 12,800 bytes
With this patch:    32 bytes/entry * 100 basic blocks / 50% occupancy =  6,400 bytes

It is true that SmallVector is more convenient to use but this change also has potential to increase memory usage. Have you checked whether this can increase memory usage?

The differences in memory consumption seem to be really small.

I have invoked clang for three input files, three times each, and looked at Maximum resident set size (kbytes) with /usr/bin/time -v.

JumpThreading.ii
Without this patch: 280792 281016 281392
With this patch:    280736 281484 281568

SROA.ii
Without this patch: 311820 312412 312488
With this patch:    311224 311356 311468

LICM.ii
Without this patch: 346300 346380 348356
With this patch:    345528 348084 348564

Under the hood, the difference should be a wash because:

DenseMap<std::pair<const BasicBlock *, unsigned>, BranchProbability>::value_type

is 24 bytes in size, padded to 32 bytes, and

DenseMap<BasicBlock *, SmallVector<BranchProbability, 2>>::value_type

is 32 bytes in size.

If we assume that we have a function with 100 "if" statements with no switch statements at 50% occupancy of the hash table, we have:

Without this patch: 32 bytes/entry * 200 edges        / 50% occupancy = 12,800 bytes
With this patch:    32 bytes/entry * 100 basic blocks / 50% occupancy =  6,400 bytes

Thanks for the explanation!

llvm/lib/Analysis/BranchProbabilityInfo.cpp
1098

const std::vector<...> & (specify the concrete type if unclear)

1120

const std::vector<...> & (specify the concrete type if unclear)

kazu updated this revision to Diff 303692.Nov 7 2020, 7:55 PM

I've specified for SrcProbs.

kazu marked 2 inline comments as done.Nov 7 2020, 7:56 PM

This looks to be a good change towards simplification. I think originally a map over edges was introduced by @atrick back in 2011 in https://github.com/llvm/llvm-project/commit/49371f3f33788 as it could be sparse. But with the commit https://gitlab.azulsystems.com/dev/orca/commit/8138487468e22 we imposed an important restriction on setting outgoing edge probabilities at once for one block. This allowed correctness checks and further improvements like this.

As of performance impact of this patch I would suggest that you check llvm-compile-time-tracker.com out or contact @nikic directly.

llvm/lib/Analysis/BranchProbabilityInfo.cpp
1172–1175

remove braces

kazu updated this revision to Diff 303933.Nov 9 2020, 10:49 AM

Removed unnecessary curly braces.

kazu marked an inline comment as done.Nov 9 2020, 1:22 PM
kazu added a comment.Nov 9 2020, 3:07 PM

@nickc kindly measured the compilation time for me:

https://llvm-compile-time-tracker.com/compare.php?from=dd5b51f4fabbb44c2f6629f0dafefa7b7f3f0b23&to=2d731aa30cd9f27daedae206d4cb0de8a4fe3ad8&stat=instructions

shows a very slight improvement.

Qualitatively speaking, we are making the data structure more contiguous while maintaining the initial size of Probs. Once we start inserting edge probabilities, then the SmallVector approach should come out positive (that is, less memory) because a basic block ending with an "if" instruction takes up a single hash entry as opposed to two.

MaskRay accepted this revision.Nov 9 2020, 4:32 PM

Thanks!

This revision is now accepted and ready to land.Nov 9 2020, 4:32 PM
This revision was landed with ongoing or failed builds.Nov 9 2020, 5:30 PM
This revision was automatically updated to reflect the committed changes.
atrick added a comment.Nov 9 2020, 9:22 PM

I won't weigh in decisively, but it does look like a positive change to me.

When it comes to designing for compile time and memory usage, it's more important to protect against the worst-case (pathological input) than streamlining the average case--there's typically one straggler function that takes excessive time while the rest can be easily parallelized. That was the motivation behind avoiding allocations that may scale with the function size.

On the other hand, the worst-case with this change--a very large number of switches in the same function---seems truly pathological, and the consequence is not so bad--it's just a linear increase in allocations and memory.

When in doubt, the simpler implementation should be preferred.

dmajor added a subscriber: dmajor.Nov 10 2020, 5:18 PM

Our bots are failing on the new assertion "The number of edge probabilities must match the number of successors." during PGO builds of clang.

The backtrace and reproducers are available from https://treeherder.mozilla.org/logviewer?job_id=321399701&repo=try&lineNumber=50587 (you may need to click "Show Job Info" in the top bar to get the files).

yrouban added inline comments.Nov 15 2020, 10:08 PM
llvm/lib/Analysis/BranchProbabilityInfo.cpp
1121

could you please extract this assertion and the one at line 1099 to a separate patch that the rest NFC should depend on? This new extracted patch would be a functional change that impose a stronger restriction on BPI correctness.

kazu added inline comments.Nov 15 2020, 10:12 PM
llvm/lib/Analysis/BranchProbabilityInfo.cpp
1121

I'll do that. Again, let me do additional testing on my side and fix whatever issues come up before I'll post the assertion-only patch.