Following the plan described on LLVM-dev (http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/076732.html), this patch provides a first version
of a new fence-elimination pass. It has several known restrictions/weaknesses,
but I would like some feedback early, hence this review.
The main problem is that there is no easy way to only run the pass on a
function if AtomicExpand introduced fences in that function. This is made
worse by the requirement on BlockFrequencyInfo and BreakCriticalEdges of
this pass:
- There is a non-negligible cost incurred even on code with no atomics/fences
- Because BreakCriticalEdges changes control-flow, it breaks a test (and running SimplifyCFG afterwards which would be nice would break even more tests).
I can see several ways to fix this:
- Merge the pass in AtomicExpand. Beyond the ugliness, this would require a way of running BreakCriticalEdges and BlockFrequencyInfo from inside AtomicExpand which does not seem supported by the current PassManager infrastructure.
- Break critical edges lazily inside this pass. This would still incurs costs for BlockFrequencyInfo; worse it would require updating the block frequency as well, and would add some significant complexity to the pass.
- Have a flag for running this pass and set it to false for the tests this break. This is the easiest but would do nothing for the performance costs.
What would you suggest ?
Some other parts remain to be done, but can be added later, and I hope to first
fix the above issue. For example:
- the pass is only enabled for Power, and not yet for ARM
- the min-cut implementation is rather basic and not super optimized
- the pass does not make use of information passed by AtomicExpand to optimize fences more agressively (for example it cannot sink the fence out of a spinloop yet), details on how I plan to do that later are in the proposal I sent to LLVM-dev some time ago.
Some more questions for reviewers:
- should the min-cut implementation be cut in a separate file ?
- I left lots of DEBUG() statements, should they be removed ?
Thanks for taking the time of reading this !
I'm not a fan of default arguments, I think you shouldn't have them.
Can you add a description of what stronger/weaker mean, as well as FenceArgs?
std::function isn't in common use in the code base, can you see what similar functions use instead?
Overall you're trying to create a partial ordering for target-specific intrinsics for fences, and then manually adding this pass many times to a backend. Could you instead have the backend specify the ordering, and add the pass only once? The pass would handle each fence type that it's made aware of, in the proper strong->weak ordering.