This is an archive of the discontinued LLVM Phabricator instance.

Add a speculative execution pass
ClosedPublic

Authored by broune on Apr 29 2015, 5:43 PM.

Details

Summary

This is a pass for speculative execution of instructions for simple if-then (triangle) control flow. It's aimed at GPUs, but could perhaps be used in other contexts. Enabling this pass gives us a 1.0% geomean improvement on Google benchmark suites, with one benchmark improving 33%.

Credit goes to Jingyue Wu for writing an earlier version of this pass.

Patched by Bjarke Roune.

Diff Detail

Event Timeline

broune updated this revision to Diff 24670.Apr 29 2015, 5:43 PM
broune retitled this revision from to Add a speculative execution pass.
broune updated this object.
broune edited the test plan for this revision. (Show Details)
broune added subscribers: Unknown Object (MLST), jholewinski.

It seems this pass has a good deal in common with SimplifyCFG. How is this
pass different from bumping up the cost threshold in that pass?

It seems this pass has a good deal in common with SimplifyCFG. How is this
pass different from bumping up the cost threshold in that pass?

I assume you are referring to the functionality in the function SpeculativelyExecuteBB in SimplifyCFG. As I read what it does, SimplifyCFG will not speculate if no selects are introduced and it will speculate at most one instruction. It also will not speculate if there is a value defined in the if-block that is only used in the then-block. These restrictions make sense since the speculation in SimplifyCFG seems aimed at introducing cheap selects, while this pass is intended to do more aggressive speculation while counting on later passes to either capitalize on that or clean it up.

What this pass does could be merged into SimplifyCFG, though at first look it doesn't appear to me to be an easy fit. We'd also need two different configurations of SimplifyCFG since aggressive speculation at every point where SimplifyCFG runs would probably be too much; I've experimented with adding this pass in multiple places simultaneously and that hasn't yielded good results so far.

jingyue added inline comments.May 5 2015, 10:52 AM
lib/Transforms/Scalar/SpeculativeExecution.cpp
61

Comment on why this threshold is necessary. My understanding is the NVIDIA driver only speculates a limited number of instructions. If too many instructions are left behind, the conditional basic block won't be executed in the SASS level.

70

Add a constructor that takes spec-exec-max-speculation-cost and spec-exec-max-not-hoisted as parameters. This allows embedded uses of LLVM (e.g., users can programmatically create these passes with their desired thresholds).

114

Can we make a new interface getSingleSuccessor()?

142

Should we consider ConstantExpr free?

181

TotalSpeculationCost

182
for (auto &I : FromBlock)
jingyue added inline comments.May 5 2015, 1:25 PM
lib/Transforms/Scalar/SpeculativeExecution.cpp
61

I meant "won't be speculatively executed".

meheff added inline comments.May 5 2015, 2:07 PM
lib/Transforms/Scalar/SpeculativeExecution.cpp
65

The double negative at the end of this description confused me as written. Maybe: "Speculative execution is not be applied to basic blocks where the number of ... exceeds this limit."

Could change the other option description to match.

147

Stale comment?

160

Instruction::And? Can you use TTI instruction costs here?

hfinkel added inline comments.May 5 2015, 2:28 PM
include/llvm/Transforms/Scalar.h
426

This pass does not speculatively execute anything. Please reword.

lib/Transforms/IPO/PassManagerBuilder.cpp
232

This is pretty early in the pipeline, why? Do you want LICM to run first?

lib/Transforms/Scalar/SpeculativeExecution.cpp
148

And should this depend on the number of non-constant indices?

broune updated this revision to Diff 25005.May 5 2015, 9:41 PM

Added and used BasicBlock::getSingleSuccessor().
Improved flag descriptions.

broune updated this revision to Diff 25007.May 5 2015, 10:13 PM

Use getSingleSuccessor() more.

include/llvm/Transforms/Scalar.h
426

I'm very happy to correct any mistaken terminology. I'm unclear on whether you object to "speculation" or "execution" or both and what you would prefer instead? I note that SimplifyCFG has a function SpeculativelyExecuteBB that also moves instructions out of branches and I did see some sources with a similar terminology, e.g.

"Software Speculative Execution [...]
When the compiler generates speculative code, it moves instructions in front of a branch that previously had protected them from causing exceptions."
http://techpubs.sgi.com/library/dynaweb_docs/0650/SGI_Developer/books/OrOn2_PfTune/sgi_html/ch05.html

lib/Transforms/IPO/PassManagerBuilder.cpp
232

I did start with a late placement, right before the nvptx backend runs. I also tried right before LICM and a bit after LICM and other places. The current placement is what worked best out of those things that I tried.

My intuition of why this early placement works well is that we speculate, optimize and then soon after those instructions that were speculated but did not enable further optimization get put back by InstCombine.

lib/Transforms/Scalar/SpeculativeExecution.cpp
61

Done. The base reason for this limit is that speculating just a few instructions from a larger block tends not to be profitable. That could in part be due to an interaction with the NVIDIA driver, even though this pass is run quite early in the compiler, putting it later makes our benchmarks worse, and speculated instructions that are not further optimized tend to be sunk back by InstCombine to where they were before speculation. In the best case that I've seen for this pass, speculation allowed further follow-on optimizations, collapsing a lot of control flow and bit-fiddling instructions into fewer instructions; that sort of thing is less likely to happen when speculating just a few instructions from a larger block, so that's another reason for the limit. The comment I added fits both reasons.

65

Done.

70

This doesn't seem to be common for the other passes that I looked at in Scalar/ that have flags and it seems like it would be hard to maintain when options are added or removed, so I'd prefer to postpone adding that until someone uses it. I could add it now if you'll be using it right away?

114

Nice idea. Done.

142

Do you mean should we return 0 if all operands to an instruction are ConstantExpr? I'd think most such cases would be folded already, though maybe not for GEP, since I believe that it's required for changing types, so that does seem reasonable to catch.

147

Done.

148

I'm especially keen to speculate GEPs, as that was part of the original motivation for the pass, but you're right that this might not be good. I'll try having the cost be the number of non-constant indices + 1 and get back to you.

160

Since the speculated instructions can be sunk back or optimized later, the optimal score isn't necessarily the time it would take to execute the instructions, it's more propensity to cause good things and not cause bad things when speculated. Though TTI could still be a better approximation to that than what I've got here, so I'll try it both ways and get back to you.

181

Done.

182

Done.

jingyue added inline comments.May 6 2015, 10:15 AM
lib/Transforms/Scalar/SpeculativeExecution.cpp
70

Fine with me. I've seen CFGSimplifyPass, JumpThreading and LoopUnrolling have pass parameters.

160

If TTI's cost model works for you, I'd prefer using that. No need to invent another cost model if an existing one already works.

broune updated this revision to Diff 25075.May 6 2015, 11:47 AM

Use TTI for speculation cost of whitelisted instructions.

I promised to get back to you on a few things. I tried these versions overnight:

  1. no changes,
  2. as 1, but with the cost of a GEP as the number of non-constant indices + 1,
  3. as 1, but with the cost function entirely replaced with TTI->getUserCost,
  4. as 3, but still returning UINT_MAX when 1 does.

1, 2 and 4 were about the same while 3 was worse than the others. So I propose to go with 4 and I have updated the patch to that.

jingyue accepted this revision.May 7 2015, 10:37 AM
jingyue edited edge metadata.

LGTM with a minor

lib/Transforms/Scalar/SpeculativeExecution.cpp
142

I was thinking of the case where I is a ConstantExpr because I is a User which can be Instruction or ConstantExpr. However, I later noticed you only call ComputeSpeculationCost with instructions. So we should make this function to more restrictively take only Instruction.

This revision is now accepted and ready to land.May 7 2015, 10:37 AM
broune updated this revision to Diff 25281.May 7 2015, 9:19 PM
broune edited edge metadata.

Change first parameter type of ComputeSpeculationCost to Instruction *.

lib/Transforms/Scalar/SpeculativeExecution.cpp
142

Done.

hfinkel added inline comments.May 14 2015, 11:41 AM
include/llvm/Transforms/Scalar.h
426

Please reword like this:

// SpeculativeExecution - Aggressively hoist instructions to enable speculatively execution on targets where branches are expensive.

(the import part here is that the verb is 'hoist', not 'execute').

lib/Transforms/IPO/PassManagerBuilder.cpp
232

How exactly do you intend this to work? There are two or three options:

  1. Don't put it in the standard pipeline (assuming that GPU-compilers have their own pipelines anyway and will use it in those custom pipelines only).
  2. Add some TTI hook to control whether or not the pass does anything.
  3. Add a variable to the pass manager controlling this (so that the frontend must decide).

I prefer (2), but just leaving it here, as a something intended to be dead (except for use of some command-line flag) is not a reasonable plan. The command-line flags are for debugging and testing, not for production pass-manager control.

lib/Transforms/Scalar/SpeculativeExecution.cpp
11

Same problem here: the pass does not execute anything, it hoists instructions to cause speculative execution. Please reword.

broune updated this revision to Diff 25829.May 14 2015, 5:12 PM

Remove speculation flag and call from PassManagerBuilder.
Update comments on speculation.

hfinkel accepted this revision.May 14 2015, 5:14 PM
hfinkel edited edge metadata.

LGTM.

broune updated this revision to Diff 25830.May 14 2015, 5:15 PM
broune edited edge metadata.

Fix typo.

broune added inline comments.May 14 2015, 5:21 PM
include/llvm/Transforms/Scalar.h
426

Done, thank you for the clarification.

lib/Transforms/IPO/PassManagerBuilder.cpp
232

I removed the flag and the code it controls, going for option 1 for now. Let me know if anyone interested in this pass needs another option.

jingyue updated this object.May 15 2015, 10:57 AM
jingyue closed this revision.May 15 2015, 10:58 AM