This is an archive of the discontinued LLVM Phabricator instance.

[SimplifyCFG] Merge conditional stores
AbandonedPublic

Authored by jmolloy on Oct 13 2015, 7:30 AM.

Details

Summary

We can often end up with conditional stores that cannot be speculated. They can come from fairly simple, idiomatic code:

if (c & flag1)
  *a = x;
if (c & flag2)
  *a = y;
...

There is no dominating or post-dominating store to a, so it is not legal to move the store unconditionally to the end of the sequence and cache the intermediate result in a register, as we would like to.

It is, however, legal to merge the stores together and do the store once:

tmp = undef;
if (c & flag1)
  tmp = x;
if (c & flag2)
  tmp = y;
if (c & flag1 || c & flag2)
  *a = tmp;

The real power in this optimization is that it allows arbitrary length ladders such as these to be completely and trivially if-converted. The typical code I'd expect this to trigger on often uses binary-AND with constants as the condition (as in the above example), which means the ending condition can simply be truncated into a single binary-AND too: 'if (c & (flag1|flag2))'. As in the general case there are bitwise operators here, the ladder can often be optimized further too.

This optimization involves potentially increasing register pressure. Even in the simplest case, the lifetime of the first predicate is extended. This can be elided in some cases such as using binary-AND on constants, but not in the general case. Threading 'tmp' through all branches can also increase register pressure.

The optimization as in this patch is enabled by default but kept in a very conservative mode. It will only optimize if it thinks the resultant code should be if-convertable, and additionally if it can thread 'tmp' through at least one existing PHI, so it will only ever in the worst case create one more PHI and extend the lifetime of a predicate.

This doesn't trigger much in LNT, unfortunately, but it does trigger in a big way in a third party test suite.

LNT diff: one test regresses by 2%, another improves by 3%.

Diff Detail

Repository
rL LLVM

Event Timeline

jmolloy retitled this revision from to [SimplifyCFG] Merge conditional stores.Oct 13 2015, 7:30 AM
jmolloy updated this object.
jmolloy added reviewers: sanjoy, majnemer.
jmolloy updated this revision to Diff 37250.Oct 13 2015, 7:30 AM
jmolloy set the repository for this revision to rL LLVM.
jmolloy added a subscriber: llvm-commits.
jmolloy added a subscriber: mcrosier.
reames added a subscriber: reames.Oct 14 2015, 2:37 PM
reames added inline comments.
include/llvm/Transforms/Utils/Local.h
141 ↗(On Diff #37250)

This seems potentially dangerous. Since SimplifyCFG doesn't preserve AA, are there any situations where a transform could have invalidated cached information, and then your transform runs?

lib/Transforms/Utils/SimplifyCFG.cpp
2411

Is there a simple form of this which doesn't require anything other than trivial AA? If so, I'd start with that, get the transform working and in, then generalize.

2440

There's a good chance that pstore isn't from the conditional block at all. You might want to check dominance. Optionally, you can speculate pstore out of the conditional block is it's safe to speculate.

2465

This long wall of code is hard to follow. Please use some well named helper functions.

2509

One think you might consider: are there cases where we can insert a store down the other path? Doing so in general is clearly a violation of both dereferenceability and the memory model, but what about a conditional store to a dereferenceable location which is known to be thread local?

I ran across a similar case in LICM's store promotion which I'm thinking about implementing since it would really help one of my benchmarks. Just throwing out the idea in case you find it helpful.

2563

Spacing. Clang-format?

2566

This chain of checks is hard to read. Could it be restructured along with the above code to make it simpler?

2575

Some of these are legality, some are profitability, some are implementation limits. Please separate and comment each class.

Forgot to say: very interesting transform. I'm glad to see you're proposing this. I'm a bit concerned about the profitability aspects though. You might want to further restrict the conditions to be things which we know can be combined/folded cheaply.

jmolloy updated this revision to Diff 38004.Oct 21 2015, 6:34 AM
jmolloy marked an inline comment as done.

Hi Philip,

Thanks very much for the review. A new version is attached.

Cheers,

James

jmolloy marked 4 inline comments as done.Oct 21 2015, 6:36 AM
jmolloy added inline comments.
lib/Transforms/Utils/SimplifyCFG.cpp
2440

Actually, PStore must come from a conditional block (see line 2410).

2509

This is a good possible extension to this optimization! Probably best to wait until it's gone in in its current, more simple form though.

sanjoy edited edge metadata.Oct 24 2015, 12:46 AM

I haven't grokked the whole patch yet, but some comments inline based off of what I've understood so far. Also, please upload the diff with full context.

lib/Transforms/Utils/SimplifyCFG.cpp
2385

These are suggestions, I'm not a 100% sure that these will actually make the code more readable:

  • Extract out a BasicBlock *OtherIncomingBlock variable
  • Put an assert in the loop verifying that there are only two incoming blocks (or remove the loop and have an if/else in its place). IOW, make it obvious that we're not dealing with blocks with an arbitrary number of preds.
2416

I think you need QStore->getPointerOperand()->getType() here. All StoreInst s have the type void.

2450

What about things like AtomicRMWInst, AtomicCmpXchgInst and FenceInst?

I think you're better off querying Instruction::mayReadOrWriteMemory.

2517

However, this is a pessimization if both conditions are always false, and the resulting code does not simplify further, right?

2597

Nit: spacing should be SmallVector<Value *, 4>.

2598

I'm not sure this is correct -- std::set_intersection expects both the ranges to be sorted, and I don't think SmallPtrSet is guaranteed to iterate over values in sorted order. I think it is best to use llvm::set_intersect here.

2662

Wrap the line?

Actually, I'll just assume you'll run clang-format before checkin. :)

3621

Unrelated whitespace changes?

5101

Whitespace damage?

sanjoy requested changes to this revision.Oct 24 2015, 1:23 AM
sanjoy edited edge metadata.
This revision now requires changes to proceed.Oct 24 2015, 1:23 AM
jmolloy updated this revision to Diff 38421.Oct 26 2015, 8:22 AM
jmolloy edited edge metadata.

Hi Sanjoy,

Thanks for the review! New patch attached (with full context, sorry!).

Cheers,

James

jmolloy marked 5 inline comments as done.Oct 26 2015, 8:27 AM
jmolloy added inline comments.
include/llvm/ADT/SetOperations.h
42 ↗(On Diff #38421)

This was required because SmallPtrSet doesn't have ::key_type. auto is obviously a better way to do this. I'll obviously apply this as a separate commit.

lib/Transforms/Utils/SimplifyCFG.cpp
2416

Ouch! good catch, thanks!

2517

In that case, yes it is. The important part is "if the code doesn't simplify further" though - even if both conditions are always false, if we can if-convert this is very likely to be a win.

The heuristics at the moment are trying to catch cases where we know we can if-convert.

2662

I've been selectively running clang-format on the bits I've touched, and missed this bit :(

5101

Yes, this was me undoing a change and blindingly running clang-format. More below, where clang-format has ripped trailing whitespace away :(

sanjoy requested changes to this revision.Oct 27 2015, 1:38 AM
sanjoy edited edge metadata.
sanjoy added inline comments.
lib/Transforms/Utils/SimplifyCFG.cpp
2352

I'd just iterate through the two blocks instead of iterating over all of Address's uses, unless you have reason to believe that Address will have only a small number of uses. Also see comment on IsWorthwhile.

2366

Have you considered using SSAUpdater here? This looks like its duplicating logic that already exists there.

Sorry for not bringing this up earlier!

2417

Why do you need to check the types? Aren't they both stores to Address?

2424

When you say "if-converted" what transform are you talking about specifically? Is it CodeGen/EarlyIfConversion.cpp or CodeGen/IfConversion.cpp or something else?

2426

If you move this check on the block size to before the calls to findUniqueStoreInBlocks then you'll know in findUniqueStoreInBlocks that the blocks are guaranteed to be small (assuming MergeCondStoresAggressively is false) and scanning through all of the instructions in the blocks won't hurt much.

2429

Might be better to have this as a whitelist -- "all instructions that do not touch memory and the stores we already know about" or something like that.

2450

I'm not sure what the lambda adds here -- why not directly call mayReadOrWriteMemory?

2476

Why do you need to do this? The only uses of TB I see are the XStore->getParent() == XTB checks; and they should be fine with a nullptr XTB, no?

2487

This one isn't used.

2544

Minor nit: I'd use InvertPCond as "swapping" a predicate means something slightly different in LLVM.

2566

Why do you need to check BB for nullptr?

2582

You can shorten this using initializer lists:

for (auto *BB : { PTB, PFB }) {
  if (!BB) continue;
  for (auto &I : *BB)
    if (StoreInst *SI = ...

and possibly even more with initializer lists of std::pair.

This revision now requires changes to proceed.Oct 27 2015, 1:38 AM
jmolloy marked 8 inline comments as done.Oct 27 2015, 5:50 AM

Hi Sanjoy,

Thanks again for this fantastic review! I agree with most of the comments ; patch updated.

Cheers,

James

lib/Transforms/Utils/SimplifyCFG.cpp
2366

I've just tried using SSAUpdater. Actually it really doesn't help much - the search for an appropriate already-existing PHI here is more than SSAUpdater can do itself, so replacing this function with SSAUpdater pessimizes some code (we end up with another PHI, which becomes another select).

2424

I'm talking about the if-conversion done by SimplifyCFG itself. SimplifyCFG refers to this as PHI node folding (see -phi-node-folding-threshold).

2429

This kind of already is a whitelist, is it not? We're saying that if there's any instruction that isn't one of those types, it's not worthwhile. Those are explicitly the instructions that we believe can be if-converted.

StoreInsts don't matter because we know (from findUniqueStoreInBlocks) that there can only be zero or one store.

2566

I don't. Removed.

2582

Unfortunately type deduction wasn't clever enough to work out an initializer list of std::pair (and it turns out the std::pair elements end up being const too), so I've just done the simpler transform.

jmolloy updated this revision to Diff 38534.Oct 27 2015, 6:18 AM
jmolloy edited edge metadata.
jmolloy removed rL LLVM as the repository for this revision.
jmolloy marked an inline comment as done.

Hi Sanjoy,

Do you have time to do another review round on this patch?

Cheers,

James

sanjoy requested changes to this revision.Nov 2 2015, 7:04 PM
sanjoy edited edge metadata.

I think this is fairly close to be ready to check in; mostly minor stuff inline.

While I'm comfortable giving a final LGTM once these comments are addressed, it will be more preferable if you can recruit someone more familiar with SimplifyCFG than me to take a final look.

lib/Transforms/Utils/SimplifyCFG.cpp
2353

I'd use an early continue here, will reduce the nesting a bit.

2356

This check isn't necessary any more -- SI will always be either in BB1 or BB2.

2383

This will insert a redundant PHI if the first PHI in Succ only has one correct incoming value, but is followed by a PHI that has both V and AlternateV correctly as incoming values. How about extracting out a lambda that checks if a phi node does what you want (both for V and AlternateV) and use that in this loop?

This is minor though, and if you don't prefer doing this then I'm fine with it.

2414

Fix the wrapping here.

2558

Nit: "dominating"

This revision now requires changes to proceed.Nov 2 2015, 7:04 PM
jmolloy marked 5 inline comments as done.Nov 3 2015, 5:09 AM

Hi Sanjoy,

Thanks again for the review! I'll try and get David or Hal to give it a once-over before I commit.

Cheers,

James

jmolloy updated this revision to Diff 39048.Nov 3 2015, 5:24 AM
jmolloy edited edge metadata.
jmolloy set the repository for this revision to rL LLVM.
hfinkel added a subscriber: hfinkel.Nov 3 2015, 7:54 AM
hfinkel added inline comments.
lib/Transforms/Utils/SimplifyCFG.cpp
2349

This function signature and comment are out of date (Address is not used here).

2368

Please comment on what AlternativeV means (when it is not null).

2388

This seems somewhat dangerous as an assertion, unless you specifically check it as a precondition somewhere. Otherwise, it can be violated in non-obvious ways (the block could have its address taken, for example).

2423

You also need to skip debug intrinsics. Bitcasts. but only on pointer types, are probably also a good idea.

2499

You're not preserving here any of the (aliasing) metadata that might have been present on the stores.

jmolloy updated this revision to Diff 39194.Nov 4 2015, 5:33 AM
jmolloy edited edge metadata.
jmolloy marked 4 inline comments as done.

Addressed Hal's comments.

hfinkel accepted this revision.Nov 4 2015, 7:16 AM
hfinkel added a reviewer: hfinkel.

LGTM.

jmolloy abandoned this revision.Nov 4 2015, 7:33 AM

Thanks Hal, committed in r252051.