This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/
-
llvm/
-
InitializePasses.h
-
Transforms/
4
Scalar.h
-
lib/
-
Target/PowerPC/
-
PowerPC/
5
PPCTargetMachine.cpp
-
Transforms/Scalar/
-
Scalar/
-
CMakeLists.txt
18
FencesPRE.cpp
-
Scalar.cpp
-
test/CodeGen/PowerPC/
-
CodeGen/
-
PowerPC/
-
atomics-fences-pre.ll

Differential D5758

Add a fence elimination pass
Needs RevisionPublic

Authored by morisset on Oct 13 2014, 12:43 PM.

Download Raw Diff

Details

Reviewers

t.p.northover
echristo
jfb
hfinkel
javed.absar

Summary

Following the plan described on LLVM-dev (http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/076732.html), this patch provides a first version
of a new fence-elimination pass. It has several known restrictions/weaknesses,
but I would like some feedback early, hence this review.

The main problem is that there is no easy way to only run the pass on a
function if AtomicExpand introduced fences in that function. This is made
worse by the requirement on BlockFrequencyInfo and BreakCriticalEdges of
this pass:

There is a non-negligible cost incurred even on code with no atomics/fences
Because BreakCriticalEdges changes control-flow, it breaks a test (and running SimplifyCFG afterwards which would be nice would break even more tests).

I can see several ways to fix this:

Merge the pass in AtomicExpand. Beyond the ugliness, this would require a way of running BreakCriticalEdges and BlockFrequencyInfo from inside AtomicExpand which does not seem supported by the current PassManager infrastructure.
Break critical edges lazily inside this pass. This would still incurs costs for BlockFrequencyInfo; worse it would require updating the block frequency as well, and would add some significant complexity to the pass.
Have a flag for running this pass and set it to false for the tests this break. This is the easiest but would do nothing for the performance costs.

What would you suggest ?

Some other parts remain to be done, but can be added later, and I hope to first
fix the above issue. For example:

the pass is only enabled for Power, and not yet for ARM
the min-cut implementation is rather basic and not super optimized
the pass does not make use of information passed by AtomicExpand to optimize fences more agressively (for example it cannot sink the fence out of a spinloop yet), details on how I plan to do that later are in the proposal I sent to LLVM-dev some time ago.

Some more questions for reviewers:

should the min-cut implementation be cut in a separate file ?
I left lots of DEBUG() statements, should they be removed ?

Thanks for taking the time of reading this !

Diff Detail

Event Timeline

morisset updated this revision to Diff 14813.Oct 13 2014, 12:43 PM

morisset retitled this revision from to Add a fence elimination pass.

morisset updated this object.

morisset edited the test plan for this revision. (Show Details)

morisset added reviewers: jfb, echristo, rengolin.

morisset added a subscriber: Unknown Object (MLST).

Herald added a subscriber: aemerson. · View Herald TranscriptOct 13 2014, 12:43 PM

morisset added a reviewer: t.p.northover.Oct 13 2014, 12:53 PM

In the description, could you add a link to the LLVM-dev discussion for this?

What's missing to enable ARM, x86, and other architectures? Could you also use this pass on non-platform-specific fences (C++11 style fences instead)?

It would be nice to have more complex examples with bigger CFG, and with memory accesses (not just atomic ones).

As we discussed yesterday: do you think that it would be easy to generate dot graphs too, to make is easier to see what's going on?

Do you have a feel for the runtime of this pass on large codebases? I understand that there are overheads due to the required passes, but I want to make sure that you try profiling just this pass, to make sure there aren't glaring inefficiencies added.

include/llvm/Transforms/Scalar.h
44	I'm not a fan of default arguments, I think you shouldn't have them. Can you add a description of what stronger/weaker mean, as well as FenceArgs? `std::function` isn't in common use in the code base, can you see what similar functions use instead? Overall you're trying to create a partial ordering for target-specific intrinsics for fences, and then manually adding this pass many times to a backend. Could you instead have the backend specify the ordering, and add the pass only once? The pass would handle each fence type that it's made aware of, in the proper strong->weak ordering.
lib/Target/PowerPC/PPCTargetMachine.cpp
20	Keep these sorted.
lib/Transforms/Scalar/FencesPRE.cpp
53	"before FencesPRE" is less odd. Pre-PRE sounds weird :)
59	`clang-format` the entire file.
95	You don't need to `typedef struct A {} A;` in C++, just `struct A {};` is enough.
113	You don't need `struct` here (C versus C++).
116	You can use a delegating constructor to `FlowGraphNode(Instruction * Inst)`, though it doesn't look like this constructor is used, so you could mark it as `= delete` instead.
171	Does this ever happen? It seems like it's an error and should be an `assert`.
180	Why 1 in all cases? That seems pretty low, especially since this is all stack allocated it's pretty cheap to start with 4 or 8, though real-world numbers are better.
210	What is this loop doing? Putting it in its own function could make things easier to understand.
428	I'd avoid the macros here and in other places. You have a `typedef` for the types, so it would be better to use `std::numeric_limits`.

Thanks for the review !

Enabling it for ARM/x86 is really easy: it is just a cut-&-paste of the code added by this patch for the Power backend, adapting the arguments to the pass constructor to give it the right fences. I did not do it in this patch to avoid having to change it all if the interface to the pass changes during review (which is looking likely).

I will try to add a bigger example, it is just time consuming and I wanted to get feedback as soon as possible.

For the interaction with dot, it would probably be conceptually easy but require a fair bit of plumbing, I will look at how the -view-isel-dags and friends options work.

The main difficulty with profiling this patch is that the pass will basically do nothing on code without atomics (well, break critical edges followed by block frequency info, followed by going through every instruction and testing them for seeing if they are fences). So I would need a codebase that is at the same time large, full of atomics, and with an easy to hack build system so that I could cross-compile it for power.... So I can test a few small testcases, but I don't see how to do large-scale profiling. If the codebase has no atomic, the cost of this pass is completely dominated by its requirements.

Answers to the inline comments below, I will update this patch to apply your suggestions soon.

include/llvm/Transforms/Scalar.h
44	OK for removing the default argument. For the ordering, I will try to see if having the backend pass an ArrayRef<> of the different fences, sorted by strength works. The main weakness is that it would make it a total ordering which is fine for most architectures but may fail for some (only Alpha which is not supported by LLVM out of my head, but I don't know every architecture out there). How could the backend cleanly specify a partial ordering ?
lib/Target/PowerPC/PPCTargetMachine.cpp
20	OK
lib/Transforms/Scalar/FencesPRE.cpp
53	OK
59	OK
95	Thanks, I didn't know that.
113	Same
116	I will look into it, I think I had to add this constructor to silence a warning, it should indeed be unused.
171	OK.
180	My reasoning is that most function will have no fences, so 0 element for these, or just be a tiny helper with only one atomic access. But I can easily change it to 4 or 8 as it is so cheap.
210	OK, I will try to put it into its own function, I hoped the comments inside would be enough.
428	OK (sorry I forgot this, I remember you suggested it already).

morisset updated this object.Oct 16 2014, 11:21 AM

morisset edited edge metadata.

I missed your idea of running this pass for C++11 fences instead. In short: this is probably useless, as C++11 fences are expected to be rather rare (compared to atomic accesses that result in target-specific fences), and the semantics of C++11 fences are hairy enough I don't want to try without a clear benefit. So while the infrastructure could support it, I don't plan on doing it.

Apply clang-format, fix small style issues pointed by jfb.
I haven't done the bigger modifications yet.

morisset added inline comments.Oct 17 2014, 2:30 PM

include/llvm/Transforms/Scalar.h
44	According to the LLVM programmer's manual, std::function is fine for closures if they are to be stored (like here). And they are used in a few places in the code. An additional difficulty with using an ArrayRef, is that there are lots of information we may want to compare: the intrinsic ID, the operands to the intrinsic, and possibly metadata in the future. So it would be an ArrayRef of tuples of monstrous arity. Is it ok if I keep the current architecture at least for now ?

jfb added inline comments.Oct 17 2014, 2:58 PM

include/llvm/Transforms/Scalar.h
44	I'm convinced :)

Add an option "-view-fences-pre-flow-graph" to visualize the graph being used,
per the request of jfb (it makes things much easier to debug).

Also some general cleanup.

Reviews most welcome !

Initialize the FlowGraphNode::Next iterator more cleanly

morisset added a reviewer: reames.Oct 21 2014, 2:45 PM

ping.

Hi Hal,

Thanks for accepting at the LLVM conference to take a look at this patch !

Wheee.
This brings me back to the world of speculative PRE.

There is a little known paper that makes this actually practical without doing min-cut work.
A person who was at IBM did some work on this.

The paper you want is:
http://link.springer.com/chapter/10.1007%2F11860990_22
A better version is in David J. Pereira's thesis
See https://dspace.library.uvic.ca/bitstream/handle/1828/292/Pereira_0336403_Thesis-Final_21Dec2007_.pdf?sequence=1&isAllowed=y

This should enable you to do what you want without min-cut, paying a small cost.

I know it's *actually* been implemented in production compilers.

(Discussing this offline with Danny)

I'd like measurements for your approach. I'm not sure it'll be slow on big functions because your graph is a subset of all the IR in the function.

As we discussed, csmith should be able to generate big functions with lots of fences. Measuring runtime at different CFG size and fence+surrounding memory size would be useful, and basic profiling would show where overheads are in the current code.

I think that should inform whether another algorithm should be tried.

Hey, if you can get it to work fast with a min-cut formulation on programs
with lots of fences, that would be awesome :)
As I told JF, I haven't played with such formulations in about 7 years, so
it's entirely possible the world is a better place now.

They are certainly conceptually simpler to work with and reason about :)

Have you been able to get compile-time data for large functions yet? We should move this forward unless there are practical problems with the min-cut algorithm.

lib/Target/PowerPC/PPCTargetMachine.cpp
184	If doing this is generally a good thing, then we should always do it (and on what target would that not be true?). Otherwise, the CFG simplification pass is mostly a wrapper around the SimplifyCFG utility function, and perhaps it should just be called directly from the FencesPRE pass?

Hi,

To dberlin: I looked at this article, but they explain well that the reason min-cut is so expensive for PRE is because it must be repeated for each computation in the function (of which there can be 10s of thousand in very large function) and must look at a potentially huge graph. In comparison we only run this twice: once for hwsync and one for lwsync. Furthermore, because the graph is stopped by any memory access (and not just use/kill of some very specific computation as in PRE), I expect each of these runs of min-cut to be quite cheap. I have not had the time to benchmark the compile-time cost of this pass (deadline tomorrow for PLDI..), but in summary I expect it to be small, even for large functions full of fences.

Thanks for the comments.

lib/Target/PowerPC/PPCTargetMachine.cpp
184	I agree it might be a good thing to run it anyway on all targets, but some tests (at least) on Power contain conditional jumps based on undef, and SimplifyCFG makes a complete mess of them. The cleanup is mostly because of requiring BreakCriticalEdges (that I have not found how to do on demand while preserving BlockFrequencyInfo yet), so calling it directly from FencesPRE would not solve the issue.

I agree it might be a good thing to run it anyway on all targets, but

some tests (at least) on Power contain conditional jumps based on undef,
and SimplifyCFG makes a complete mess of them.

Would it be worth fixing the tests and turning on SimplifyCFG in a first
pass? Or as you suggested having a disable flag for this (icky...).

In D5758#28, @jfb wrote:

I agree it might be a good thing to run it anyway on all targets, but

some tests (at least) on Power contain conditional jumps based on undef,
and SimplifyCFG makes a complete mess of them.

Would it be worth fixing the tests and turning on SimplifyCFG in a first
pass? Or as you suggested having a disable flag for this (icky...).

There might be no good solution here if we want to run SimplifyCFG generally -- these are bugpoint-generated tests, and thus have a lot of undefs, and will be sensitive to this kind of change. I think a flag is fine.

As noted below, however, a better solution is to have the new fences pass call the associated utility functions directly -- and as noted, this might require some amount of improvement to those utility functions to preserve BFI.

lib/Target/PowerPC/PPCTargetMachine.cpp
184	BreakCriticalEdges is a simple wrapper around the SplitCriticalEdge utility function. If we need to teach SplitCriticalEdge how to preserve BlockFrequencyInfo, then let's do that.

reames removed a reviewer: reames.Dec 29 2014, 6:20 PM

rengolin resigned from this revision.Mar 10 2015, 4:41 AM

rengolin removed a reviewer: rengolin.

I'd love to see this happen, but it hasn't moved in years. Should we abandon?

This revision now requires changes to proceed.May 7 2018, 10:18 PM

Herald added a reviewer: javed.absar. · View Herald TranscriptMay 7 2018, 10:18 PM

Herald added subscribers: mgrang, kbarton, kristof.beyls and 2 others. · View Herald Transcript

Revision Contents

Path

Size

include/

llvm/

InitializePasses.h

1 line

Transforms/

Scalar.h

17 lines

lib/

Target/

PowerPC/

PPCTargetMachine.cpp

28 lines

Transforms/

Scalar/

CMakeLists.txt

1 line

FencesPRE.cpp

689 lines

Scalar.cpp

1 line

test/

CodeGen/

PowerPC/

atomics-fences-pre.ll

57 lines

Diff 15211

include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 114 Lines • ▼ Show 20 Lines
	void initializeDomOnlyViewerPass(PassRegistry&);			void initializeDomOnlyViewerPass(PassRegistry&);
	void initializeDomPrinterPass(PassRegistry&);			void initializeDomPrinterPass(PassRegistry&);
	void initializeDomViewerPass(PassRegistry&);			void initializeDomViewerPass(PassRegistry&);
	void initializeDominanceFrontierPass(PassRegistry&);			void initializeDominanceFrontierPass(PassRegistry&);
	void initializeDominatorTreeWrapperPassPass(PassRegistry&);			void initializeDominatorTreeWrapperPassPass(PassRegistry&);
	void initializeEarlyIfConverterPass(PassRegistry&);			void initializeEarlyIfConverterPass(PassRegistry&);
	void initializeEdgeBundlesPass(PassRegistry&);			void initializeEdgeBundlesPass(PassRegistry&);
	void initializeExpandPostRAPass(PassRegistry&);			void initializeExpandPostRAPass(PassRegistry&);
				void initializeFencesPREPass(PassRegistry &);
	void initializeGCOVProfilerPass(PassRegistry&);			void initializeGCOVProfilerPass(PassRegistry&);
	void initializeAddressSanitizerPass(PassRegistry&);			void initializeAddressSanitizerPass(PassRegistry&);
	void initializeAddressSanitizerModulePass(PassRegistry&);			void initializeAddressSanitizerModulePass(PassRegistry&);
	void initializeMemorySanitizerPass(PassRegistry&);			void initializeMemorySanitizerPass(PassRegistry&);
	void initializeThreadSanitizerPass(PassRegistry&);			void initializeThreadSanitizerPass(PassRegistry&);
	void initializeDataFlowSanitizerPass(PassRegistry&);			void initializeDataFlowSanitizerPass(PassRegistry&);
	void initializeScalarizerPass(PassRegistry&);			void initializeScalarizerPass(PassRegistry&);
	void initializeEarlyCSEPass(PassRegistry&);			void initializeEarlyCSEPass(PassRegistry&);
	▲ Show 20 Lines • Show All 159 Lines • Show Last 20 Lines

include/llvm/Transforms/Scalar.h

	Show All 10 Lines
	// in the Scalar transformations library.			// in the Scalar transformations library.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLVM_TRANSFORMS_SCALAR_H			#ifndef LLVM_TRANSFORMS_SCALAR_H
	#define LLVM_TRANSFORMS_SCALAR_H			#define LLVM_TRANSFORMS_SCALAR_H

	#include "llvm/ADT/StringRef.h"			#include "llvm/ADT/StringRef.h"
				#include "llvm/IR/Intrinsics.h"

	namespace llvm {			namespace llvm {

	class BasicBlockPass;			class BasicBlockPass;
	class FunctionPass;			class FunctionPass;
	class Pass;
	class GetElementPtrInst;			class GetElementPtrInst;
				class Instruction;
				class Pass;
	class PassInfo;			class PassInfo;
	class TerminatorInst;			class TerminatorInst;
	class TargetLowering;			class TargetLowering;
	class TargetMachine;			class TargetMachine;
				class Value;

				//===----------------------------------------------------------------------===//
				//
				// FencesPRE - Elimination of target-specific fences based on a PRE algorithm
				//
				FunctionPass *createFencesPREPass(
				Intrinsic::ID FenceInt = Intrinsic::not_intrinsic,
				ArrayRef<Value *> FenceArgs = {},
				std::function<bool(const Instruction &)> isStrongerFence =
				[](const Instruction &I) { return false; },
				std::function<bool(const Instruction &)> isWeakerFence =
				[](const Instruction &I) { return false; });
				jfbUnsubmitted Not Done Reply Inline Actions I'm not a fan of default arguments, I think you shouldn't have them. Can you add a description of what stronger/weaker mean, as well as FenceArgs? `std::function` isn't in common use in the code base, can you see what similar functions use instead? Overall you're trying to create a partial ordering for target-specific intrinsics for fences, and then manually adding this pass many times to a backend. Could you instead have the backend specify the ordering, and add the pass only once? The pass would handle each fence type that it's made aware of, in the proper strong->weak ordering. jfb: I'm not a fan of default arguments, I think you shouldn't have them. Can you add a description…
				morissetAuthorUnsubmitted Not Done Reply Inline Actions OK for removing the default argument. For the ordering, I will try to see if having the backend pass an ArrayRef<> of the different fences, sorted by strength works. The main weakness is that it would make it a total ordering which is fine for most architectures but may fail for some (only Alpha which is not supported by LLVM out of my head, but I don't know every architecture out there). How could the backend cleanly specify a partial ordering ? morisset: OK for removing the default argument. For the ordering, I will try to see if having the…
				morissetAuthorUnsubmitted Not Done Reply Inline Actions According to the LLVM programmer's manual, std::function is fine for closures if they are to be stored (like here). And they are used in a few places in the code. An additional difficulty with using an ArrayRef, is that there are lots of information we may want to compare: the intrinsic ID, the operands to the intrinsic, and possibly metadata in the future. So it would be an ArrayRef of tuples of monstrous arity. Is it ok if I keep the current architecture at least for now ? morisset: According to the LLVM programmer's manual, std::function is fine for closures if they are to be…
				jfbUnsubmitted Not Done Reply Inline Actions I'm convinced :) jfb: I'm convinced :)

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// ConstantPropagation - A worklist driven constant propagation pass			// ConstantPropagation - A worklist driven constant propagation pass
	//			//
	FunctionPass *createConstantPropagationPass();			FunctionPass *createConstantPropagationPass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	▲ Show 20 Lines • Show All 371 Lines • Show Last 20 Lines

lib/Target/PowerPC/PPCTargetMachine.cpp

	//===-- PPCTargetMachine.cpp - Define TargetMachine for PowerPC -----------===//			//===-- PPCTargetMachine.cpp - Define TargetMachine for PowerPC -----------===//
	//			//
	// The LLVM Compiler Infrastructure			// The LLVM Compiler Infrastructure
	//			//
	// This file is distributed under the University of Illinois Open Source			// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.			// License. See LICENSE.TXT for details.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// Top-level implementation for the PowerPC target.			// Top-level implementation for the PowerPC target.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "PPCTargetMachine.h"			#include "PPCTargetMachine.h"
	#include "PPC.h"			#include "PPC.h"
	#include "llvm/CodeGen/Passes.h"			#include "llvm/CodeGen/Passes.h"
				#include "llvm/Transforms/Scalar.h"
	#include "llvm/IR/Function.h"			#include "llvm/IR/Function.h"
				#include "llvm/IR/IntrinsicInst.h"
				#include "llvm/IR/Intrinsics.h"
				jfbUnsubmitted Not Done Reply Inline Actions Keep these sorted. jfb: Keep these sorted.
				morissetAuthorUnsubmitted Not Done Reply Inline Actions OK morisset: OK
	#include "llvm/MC/MCStreamer.h"			#include "llvm/MC/MCStreamer.h"
	#include "llvm/PassManager.h"			#include "llvm/PassManager.h"
	#include "llvm/Support/CommandLine.h"			#include "llvm/Support/CommandLine.h"
	#include "llvm/Support/FormattedStream.h"			#include "llvm/Support/FormattedStream.h"
	#include "llvm/Support/TargetRegistry.h"			#include "llvm/Support/TargetRegistry.h"
	#include "llvm/Target/TargetOptions.h"			#include "llvm/Target/TargetOptions.h"
	using namespace llvm;			using namespace llvm;

	▲ Show 20 Lines • Show All 124 Lines • ▼ Show 20 Lines
	} // namespace			} // namespace

	TargetPassConfig *PPCTargetMachine::createPassConfig(PassManagerBase &PM) {			TargetPassConfig *PPCTargetMachine::createPassConfig(PassManagerBase &PM) {
	return new PPCPassConfig(this, PM);			return new PPCPassConfig(this, PM);
	}			}

	void PPCPassConfig::addIRPasses() {			void PPCPassConfig::addIRPasses() {
	addPass(createAtomicExpandPass(&getPPCTargetMachine()));			addPass(createAtomicExpandPass(&getPPCTargetMachine()));
				if (getOptLevel() != CodeGenOpt::None) {
				// We first try to eliminate redundant hwsync instructions, ignoring
				// lwsyncs.
				addPass(createFencesPREPass(
				Intrinsic::ppc_sync, {},
				/IsStrongerFence=/[](const Instruction &I) { return false; },
				/IsWeakerFence=/[](const Instruction &I) {
				if (auto FI = dyn_cast<IntrinsicInst>(&I))
				if (FI->getIntrinsicID() == Intrinsic::ppc_lwsync) return true;
				return false;
				}));
				// Then we eliminate redundant lwsyncs, taking into account the hwsyncs.
				addPass(createFencesPREPass(
				Intrinsic::ppc_lwsync, {},
				/IsStrongerFence=/[](const Instruction &I) {
				if (auto FI = dyn_cast<IntrinsicInst>(&I))
				if (FI->getIntrinsicID() == Intrinsic::ppc_sync) return true;
				return false;
				},
				/IsWeakerFence=/[](const Instruction &I) { return false; }));
				// These two passes require that critical edges be split. So we should
				// cleanup the CFG afterwards.
				// FIXME: breaks a bunch of brittle tests
				// addPass(createCFGSimplificationPass());
				hfinkelUnsubmitted Not Done Reply Inline Actions If doing this is generally a good thing, then we should always do it (and on what target would that not be true?). Otherwise, the CFG simplification pass is mostly a wrapper around the SimplifyCFG utility function, and perhaps it should just be called directly from the FencesPRE pass? hfinkel: If doing this is generally a good thing, then we should always do it (and on what target would…
				morissetAuthorUnsubmitted Not Done Reply Inline Actions I agree it might be a good thing to run it anyway on all targets, but some tests (at least) on Power contain conditional jumps based on undef, and SimplifyCFG makes a complete mess of them. The cleanup is mostly because of requiring BreakCriticalEdges (that I have not found how to do on demand while preserving BlockFrequencyInfo yet), so calling it directly from FencesPRE would not solve the issue. morisset: I agree it might be a good thing to run it anyway on all targets, but some tests (at least) on…
				hfinkelUnsubmitted Not Done Reply Inline Actions BreakCriticalEdges is a simple wrapper around the SplitCriticalEdge utility function. If we need to teach SplitCriticalEdge how to preserve BlockFrequencyInfo, then let's do that. hfinkel: BreakCriticalEdges is a simple wrapper around the SplitCriticalEdge utility function. If we…
				}
	TargetPassConfig::addIRPasses();			TargetPassConfig::addIRPasses();
	}			}

	bool PPCPassConfig::addPreISel() {			bool PPCPassConfig::addPreISel() {
	if (!DisableCTRLoops && getOptLevel() != CodeGenOpt::None)			if (!DisableCTRLoops && getOptLevel() != CodeGenOpt::None)
	addPass(createPPCCTRLoops(getPPCTargetMachine()));			addPass(createPPCCTRLoops(getPPCTargetMachine()));

	return false;			return false;
	▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines

lib/Transforms/Scalar/CMakeLists.txt

	add_llvm_library(LLVMScalarOpts			add_llvm_library(LLVMScalarOpts
	ADCE.cpp			ADCE.cpp
	AlignmentFromAssumptions.cpp			AlignmentFromAssumptions.cpp
	ConstantHoisting.cpp			ConstantHoisting.cpp
	ConstantProp.cpp			ConstantProp.cpp
	CorrelatedValuePropagation.cpp			CorrelatedValuePropagation.cpp
	DCE.cpp			DCE.cpp
	DeadStoreElimination.cpp			DeadStoreElimination.cpp
	EarlyCSE.cpp			EarlyCSE.cpp
				FencesPRE.cpp
	FlattenCFGPass.cpp			FlattenCFGPass.cpp
	GVN.cpp			GVN.cpp
	IndVarSimplify.cpp			IndVarSimplify.cpp
	JumpThreading.cpp			JumpThreading.cpp
	LICM.cpp			LICM.cpp
	LoadCombine.cpp			LoadCombine.cpp
	LoopDeletion.cpp			LoopDeletion.cpp
	LoopIdiomRecognize.cpp			LoopIdiomRecognize.cpp
	Show All 26 Lines

lib/Transforms/Scalar/FencesPRE.cpp

This file was added.

				//===- FencesPRE.cpp - Fence Elimination Implementation -=====================//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This pass eliminates target-specific fence intrinsics that are redundant, or
				// partially redundant if it can introduce new fences to make them fully
				// redundant.
				//
				// The algorithm is based on the following PRE paper, adapted to work on fences:
				// Bernhard Scholz, R. Nigel Horspool, Jens Knoop:
				// Optimizing for space and time usage with speculative partial redundancy
				// elimination.
				// LCTES 2004: 221-230
				//
				// The general idea is to build a graph based on the CFG, with a node for each
				// instruction that may affect memory, and a node for each join/split in the
				// CFG, label this graph with the cost per edge of having a fence on that edge
				// (currently using BlockFrequencyInfo). Then each memory-affecting instruction
				// immediately before a fence is marked as a source, each one immediately after
				// a fence is marked as a sink, and a min-cut provides us a minimum-cost set of
				// fences that provide the same ordering guarantees.
				// There are a few subtleties: for example some nodes must be duplicated to
				// prevent them from being both a source and a sink; and the graph is only
				// built around the fences (to avoid the cost of allocating a potentially huge
				// graph).
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/ADT/ArrayRef.h"
				#include "llvm/ADT/DenseMap.h"
				#include "llvm/ADT/GraphTraits.h"
				#include "llvm/ADT/SmallPtrSet.h"
				#include "llvm/ADT/SmallVector.h"
				#include "llvm/ADT/Statistic.h"
				#include "llvm/ADT/StringExtras.h"
				#include "llvm/Analysis/BlockFrequencyInfo.h"
				#include "llvm/IR/CFG.h"
				#include "llvm/IR/InstIterator.h"
				#include "llvm/IR/IntrinsicInst.h"
				#include "llvm/IR/Intrinsics.h"
				#include "llvm/IR/IRBuilder.h"
				#include "llvm/Pass.h"
				#include "llvm/Support/CommandLine.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/Support/DOTGraphTraits.h"
				#include "llvm/Support/GraphWriter.h"
				#include "llvm/Transforms/Scalar.h"
				#include <limits>
				jfbUnsubmitted Not Done Reply Inline Actions "before FencesPRE" is less odd. Pre-PRE sounds weird :) jfb: "before FencesPRE" is less odd. Pre-PRE sounds weird :)
				morissetAuthorUnsubmitted Not Done Reply Inline Actions OK morisset: OK

				using namespace llvm;

				#define DEBUG_TYPE "fences-pre"

				STATISTIC(NumFencesDetected, "Number of fences before FencesPRE");
				jfbUnsubmitted Not Done Reply Inline Actions `clang-format` the entire file. jfb: `clang-format` the entire file.
				morissetAuthorUnsubmitted Not Done Reply Inline Actions OK morisset: OK
				STATISTIC(NumFencesInserted, "Number of fences added by FencesPRE");
				STATISTIC(NumFencesDeleted, "Number of fences erased by FencesPRE");

				static cl::opt<bool> ViewFencesPREFlowGraph(
				"view-fences-pre-flow-graph", cl::Hidden, cl::init(false),
				cl::desc(
				"Show a representation in dot of the flowgraph used by the fences-pre "
				"pass"));

				namespace {
				// signed so that we can also represent flow.
				// Capacity of edges is (BlockFrequency >> 2) + 1
				// (shifted to fit in 63 bits, +1 because adding a fence increases code size,
				// even in a block that is never executed).
				typedef int64_t Capacity;
				typedef uint32_t Height;

				struct FlowGraphNode {
				Instruction *I;
				// nullptr unless there is exactly one edge leading to this node, and a
				// fence exist on this edge. Used to keep existing fences when possible,
				// instead of erasing them and inserting a new one at the same spot.
				Instruction *FenceBeforeIt;
				Capacity Excess;
				Height Height;
				// This bit is used in the algorithm at the end to extract the min-cut
				// from the max-flow.
				bool Mark;
				std::vector<std::pair<FlowGraphNode *, Capacity> > Succs;
				std::vector<std::pair<FlowGraphNode *, Capacity> >::iterator Next;
				// Source nodes must be separate, or a single node could be source and
				// sink, which causes havoc. Terminator nodes are also duplicated to
				// avoid trouble when a basic block contains just a terminator node.
				// This is a unique_ptr because the node must be freed at the end, and
				// is not in the FlowGraph map (since the non-duplicate already is).
				std::unique_ptr<FlowGraphNode> DupNode;
				jfbUnsubmitted Not Done Reply Inline Actions You don't need to `typedef struct A {} A;` in C++, just `struct A {};` is enough. jfb: You don't need to `typedef struct A {} A;` in C++, just `struct A {};` is enough.
				morissetAuthorUnsubmitted Not Done Reply Inline Actions Thanks, I didn't know that. morisset: Thanks, I didn't know that.
				FlowGraphNode() LLVM_DELETED_FUNCTION;
				FlowGraphNode(Instruction *Inst)
				: I(Inst),
				FenceBeforeIt(nullptr),
				Excess(0),
				Height(0),
				Mark(false),
				Succs(),
				Next(Succs.begin()),
				DupNode() {}
				};

				typedef std::pair<FlowGraphNode , FlowGraphNode > Edge;
				typedef DenseMap<Instruction *, std::unique_ptr<FlowGraphNode> > FlowGraph;
				typedef SmallPtrSetImpl<FlowGraphNode *> NodeSubset;
				typedef SmallVectorImpl<Edge> Cut;
				typedef DenseMap<Edge, Capacity> Flow;
				typedef std::vector<std::vector<FlowGraphNode *> > Buckets;
				jfbUnsubmitted Not Done Reply Inline Actions You don't need `struct` here (C versus C++). jfb: You don't need `struct` here (C versus C++).
				morissetAuthorUnsubmitted Not Done Reply Inline Actions Same morisset: Same

				struct FlowGraphViewer {
				const FlowGraph *InstToNodeMap;
				jfbUnsubmitted Not Done Reply Inline Actions You can use a delegating constructor to `FlowGraphNode(Instruction * Inst)`, though it doesn't look like this constructor is used, so you could mark it as `= delete` instead. jfb: You can use a delegating constructor to ` FlowGraphNode(Instruction * Inst)`, though it doesn't…
				morissetAuthorUnsubmitted Not Done Reply Inline Actions I will look into it, I think I had to add this constructor to silence a warning, it should indeed be unused. morisset: I will look into it, I think I had to add this constructor to silence a warning, it should…
				const NodeSubset *Sources;
				const NodeSubset *Sinks;
				const Flow *F;
				FlowGraphViewer(const FlowGraph FG, const NodeSubset S, const NodeSubset *T,
				const Flow *Fl)
				: InstToNodeMap(FG), Sources(S), Sinks(T), F(Fl) {}
				};

				class FencesPRE : public FunctionPass {
				public:
				static char ID;
				FencesPRE(Intrinsic::ID FenceInt = Intrinsic::not_intrinsic,
				ArrayRef<Value *> Args = {},
				// Stronger fences can make the fence under consideration
				// redundant.
				std::function<bool(const Instruction &)> IsStrongerFence =
				[](const Instruction &I) { return false; },
				// Weaker fences are ignored by the algorithm.
				std::function<bool(const Instruction &)> IsWeakerFence =
				[](const Instruction &I) { return false; })
				: FunctionPass(ID),
				FenceIntrinsicID(FenceInt),
				FenceArgs(Args),
				isStrongerFence(IsStrongerFence),
				isWeakerFence(IsWeakerFence) {
				initializeFencesPREPass(*PassRegistry::getPassRegistry());
				}

				bool runOnFunction(Function &F) override;

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.setPreservesCFG();
				AU.addRequiredID(BreakCriticalEdgesID);
				AU.addRequired<BlockFrequencyInfo>();
				}

				private:
				Intrinsic::ID FenceIntrinsicID;
				ArrayRef<Value *> FenceArgs;
				std::function<bool(const Instruction &)> isStrongerFence;
				std::function<bool(const Instruction &)> isWeakerFence;
				BlockFrequencyInfo *BFI;

				FlowGraphNode *makeGraphUpwards(const Instruction &FI, Instruction &Root,
				FlowGraph &FlowGraph,
				NodeSubset &Sources) const;
				FlowGraphNode *makeGraphDownwards(const Instruction &FI, Instruction &Root,
				FlowGraph &FlowGraph,
				NodeSubset &Sources) const;
				void findCut(const FlowGraph &FlowGraph, const NodeSubset &Sources,
				const NodeSubset &Sinks, Cut &Cut) const;
				void insertFenceBefore(Instruction &I) const;
				bool isStop(const Instruction &I) const;
				bool isWantedFence(const Instruction &I) const;
				FlowGraphNode *getNode(FlowGraph &FlowGraph, Instruction &I,
				jfbUnsubmitted Not Done Reply Inline Actions Does this ever happen? It seems like it's an error and should be an `assert`. jfb: Does this ever happen? It seems like it's an error and should be an `assert`.
				morissetAuthorUnsubmitted Not Done Reply Inline Actions OK. morisset: OK.
				bool shouldDuplicate = false) const;
				void addEdge(FlowGraphNode N1, FlowGraphNode N2) const;
				Capacity getResidualCapacity(Flow &Flow, Capacity C, FlowGraphNode &U,
				FlowGraphNode &V) const;
				bool push(Buckets &B, Flow &Flow, const NodeSubset &S, const NodeSubset &T,
				Capacity C, FlowGraphNode &U, FlowGraphNode &V) const;
				void relabel(Flow &Flow, FlowGraphNode &U) const;
				Height discharge(Buckets &B, Flow &Flow, const NodeSubset &S,
				const NodeSubset &T, FlowGraphNode &U) const;
				jfbUnsubmitted Not Done Reply Inline Actions Why 1 in all cases? That seems pretty low, especially since this is all stack allocated it's pretty cheap to start with 4 or 8, though real-world numbers are better. jfb: Why 1 in all cases? That seems pretty low, especially since this is all stack allocated it's…
				morissetAuthorUnsubmitted Not Done Reply Inline Actions My reasoning is that most function will have no fences, so 0 element for these, or just be a tiny helper with only one atomic access. But I can easily change it to 4 or 8 as it is so cheap. morisset: My reasoning is that most function will have no fences, so 0 element for these, or just be a…
				void markNode(Flow &Flow, Cut &Cut, FlowGraphNode &U) const;
				};
				}

				char FencesPRE::ID = 0;
				INITIALIZE_PASS_BEGIN(FencesPRE, "fences-pre",
				"Partial redundancy elimination for fences", false, false)
				INITIALIZE_PASS_DEPENDENCY(BreakCriticalEdges)
				INITIALIZE_PASS_DEPENDENCY(BlockFrequencyInfo)
				INITIALIZE_PASS_END(FencesPRE, "fences-pre",
				"Partial redundancy elimination for fences", false, false)
				FunctionPass *llvm::createFencesPREPass(
				Intrinsic::ID FenceInt, ArrayRef<Value *> FenceArgs,
				std::function<bool(const Instruction &)> isStrongerFence,
				std::function<bool(const Instruction &)> isWeakerFence) {
				return new FencesPRE(FenceInt, FenceArgs, isStrongerFence, isWeakerFence);
				}

				bool FencesPRE::runOnFunction(Function &F) {
				assert(FenceIntrinsicID);
				BFI = &getAnalysis<BlockFrequencyInfo>();

				// This FlowGraph ensures that the entire graph is freed at the end of the
				// function.
				DenseMap<Instruction *, std::unique_ptr<FlowGraphNode> > FlowGraph;
				SmallPtrSet<FlowGraphNode *, 4> Sources;
				SmallPtrSet<FlowGraphNode *, 4> Sinks;
				SmallVector<Edge, 4> Cut;
				SmallPtrSet<Instruction *, 8> Fences;

				jfbUnsubmitted Not Done Reply Inline Actions What is this loop doing? Putting it in its own function could make things easier to understand. jfb: What is this loop doing? Putting it in its own function could make things easier to understand.
				morissetAuthorUnsubmitted Not Done Reply Inline Actions OK, I will try to put it into its own function, I hoped the comments inside would be enough. morisset: OK, I will try to put it into its own function, I hoped the comments inside would be enough.
				for (inst_iterator I = inst_begin(F), E = inst_end(F); I != E; ++I) {
				if (isWantedFence(*I)) {
				Fences.insert(&*I);
				++NumFencesDetected;
				FlowGraphNode UpNode = makeGraphUpwards(I, *I, FlowGraph, Sources);
				FlowGraphNode DownNode = makeGraphDownwards(I, *I, FlowGraph, Sinks);
				// These nodes can be null if there is a stronger fence just
				// before/after this one
				if (UpNode && DownNode) {
				DownNode->FenceBeforeIt = &*I;
				addEdge(UpNode, DownNode);
				}
				}
				}
				if (Fences.empty()) return false;

				DEBUG(dbgs() << "Sources:\n"; for (auto S
				: Sources) {
				dbgs() << "\t" << S << " : ";
				S->I->dump();
				});
				DEBUG(dbgs() << "Sinks:\n"; for (auto T
				: Sinks) {
				dbgs() << "\t" << T << " : ";
				T->I->dump();
				});

				findCut(FlowGraph, Sources, Sinks, Cut);

				// We want fences on all the edges in the cut and nowhere else.
				// For statistics/debugging, we don't want to count the fences already on
				// the cut as deleted and added back. So for every edge in the cut, we
				// add a fence if there is none, or mark one on the edge for
				// preservation if there are some. Then we delete every fence not marked
				// for preservation. Being "marked for preservation" in this context is
				// just being removed from the 'Fences' set.
				for (auto NPair : Cut) {
				assert(NPair.first->I->getParent() == NPair.second->I->getParent());
				DEBUG(dbgs() << "Pair in the cut:" << NPair.first << " - " << NPair.second
				<< "\n";
				dbgs() << "\t"; NPair.first->I->dump(); dbgs() << "\t";
				NPair.second->I->dump());
				// If a fence is already there, just mark it as not for deletion.
				Instruction *FenceBetweenNPair = NPair.second->FenceBeforeIt;
				if (FenceBetweenNPair != nullptr) {
				DEBUG(dbgs() << "preserving fence " << FenceBetweenNPair << "\n");
				Fences.erase(FenceBetweenNPair);
				continue;
				}
				// Otherwise, insert one.
				insertFenceBefore(*(NPair.second->I));
				++NumFencesInserted;
				}
				// And then erase any fence that is still marked for deletion.
				for (auto FI : Fences) {
				FI->eraseFromParent();
				++NumFencesDeleted;
				}

				return !Fences.empty();
				}

				bool FencesPRE::isWantedFence(const Instruction &I) const {
				auto FI = dyn_cast<IntrinsicInst>(&I);
				if (!FI) return false;
				if (FI->getIntrinsicID() != FenceIntrinsicID) return false;
				unsigned NumArgs = FI->getNumArgOperands();
				if (NumArgs != FenceArgs.size()) return false;
				for (unsigned i = 0; i < NumArgs; ++i) {
				// Args for CallInst are enumerated from 1.
				if (FenceArgs[i] != FI->getArgOperand(i + 1)) return false;
				}

				return true;
				}

				FlowGraphNode *FencesPRE::getNode(FlowGraph &FlowGraph, Instruction &I,
				bool shouldDuplicate) const {
				// Does nothing if already present.
				// FIXME: this does one allocation every time.
				FlowGraph.insert(
				make_pair(&I, std::unique_ptr<FlowGraphNode>(new FlowGraphNode(&I))));
				auto Node = FlowGraph[&I].get();
				if (shouldDuplicate) {
				if (!Node->DupNode) {
				Node->DupNode = std::unique_ptr<FlowGraphNode>(new FlowGraphNode(&I));
				}
				Node = Node->DupNode.get();
				}
				return Node;
				}

				void FencesPRE::addEdge(FlowGraphNode N1, FlowGraphNode N2) const {
				if (N1 == nullptr \|\| N2 == nullptr) return;

				auto BB1 = N1->I->getParent(), BB2 = N2->I->getParent();
				Capacity Capacity;
				if (BB1 != BB2) Capacity = (std::numeric_limits<uint64_t>::max() >> 1);
				// Plus one, because it should never be 0, because of code size, inserting a
				// fence is never completely free, even if the code is only very rarely
				// executed
				else
				Capacity = (BFI->getBlockFreq(BB1).getFrequency() >> 2) + 1;

				N1->Succs.push_back(std::make_pair(N2, Capacity));
				// We also insert the reverse edge so that excess preflow can flow backward.
				N2->Succs.push_back(std::make_pair(N1, 0));
				DEBUG(dbgs() << "addEdge (" << Capacity << "): " << N1 << " -> " << N2
				<< "\n";
				dbgs() << "\t"; N1->I->dump(); dbgs() << "\t"; N2->I->dump(););
				}

				// FIXME: The first argument is currently unused, will be useful once metadata
				// is attached to fences by AtomicExpand.
				FlowGraphNode *FencesPRE::makeGraphUpwards(const Instruction &FI,
				Instruction &Root,
				FlowGraph &FlowGraph,
				NodeSubset &Sources) const {
				auto BB = Root.getParent();
				DEBUG(dbgs() << "makeGraphUpwards from " << &Root << " : "; Root.dump());

				// Get a reverse iterator at the Root instruction
				BasicBlock::reverse_iterator it = BB->rbegin();
				for (; &*it != &Root; ++it) {
				}

				for (auto b = BB->rend(); it != b; ++it) {
				if (isStrongerFence(*it)) return nullptr;
				if (!isStop(*it)) continue;

				auto Node = getNode(FlowGraph, *it, true);
				Sources.insert(Node);
				return Node;
				}

				// We have reached the beginning of the basic block without finding any
				// reason to stop. Let's continue in the predecessors.
				auto InsertPt = BB->getFirstInsertionPt();
				auto Node = getNode(FlowGraph, *InsertPt);
				// But first, we should check whether we are at the beginning of the function.
				if (&BB->getParent()->getEntryBlock() == BB) {
				Sources.insert(Node);
				return Node;
				}
				for (auto PI = pred_begin(BB), E = pred_end(BB); PI != E; ++PI) {
				// FIXME: add a conservative option where we do not go forward if there is
				// any of our predecessor that we do not post-dominate.
				BasicBlock BBP = PI;
				Instruction *TermInst = BBP->getTerminator();
				auto TNode = getNode(FlowGraph, *TermInst, true);
				auto UpNode = makeGraphUpwards(FI, *TermInst, FlowGraph, Sources);
				if (UpNode) {
				addEdge(TNode, Node);
				addEdge(UpNode, TNode);
				}
				}
				return Node;
				}

				// FIXME: reduce duplication with makeGraphUpwards
				FlowGraphNode *FencesPRE::makeGraphDownwards(const Instruction &FI,
				Instruction &Root,
				FlowGraph &FlowGraph,
				NodeSubset &Sinks) const {
				auto BB = Root.getParent();
				DEBUG(dbgs() << "makeGraphDownwards from " << &Root << " : "; Root.dump());

				// Get an iterator at the Root instruction
				BasicBlock::iterator it = BB->begin();
				for (; &*it != &Root; ++it) {
				}

				for (auto b = BB->end(); it != b; ++it) {
				if (isStrongerFence(*it)) return nullptr;
				if (!isStop(*it)) continue;

				auto Node = getNode(FlowGraph, *it);
				Sinks.insert(Node);
				return Node;
				}

				// We have reached the end of the basic block without finding any
				// reason to stop. Let's continue in the successors.
				auto TermInst = BB->getTerminator();
				auto Node = getNode(FlowGraph, *TermInst, true);
				for (auto PI = succ_begin(BB), E = succ_end(BB); PI != E; ++PI) {
				// FIXME: add a conservative option where we do not go forward if there is
				// any of our predecessor that we do not post-dominate.
				BasicBlock BBS = PI;
				Instruction *InsertPt = BBS->getFirstInsertionPt();
				auto INode = getNode(FlowGraph, *InsertPt);
				auto DownNode = makeGraphDownwards(FI, *InsertPt, FlowGraph, Sinks);
				if (DownNode) {
				addEdge(Node, INode);
				addEdge(INode, DownNode);
				}
				}
				return Node;
				}

				bool FencesPRE::isStop(const Instruction &I) const {
				return (I.mayReadOrWriteMemory() && !isWantedFence(I) && !isWeakerFence(I)) \|\|
				isa<ReturnInst>(I);
				}

				void FencesPRE::insertFenceBefore(Instruction &I) const {
				IRBuilder<> Builder(&I);
				Module *M = Builder.GetInsertBlock()->getParent()->getParent();
				Function *Fence = Intrinsic::getDeclaration(M, FenceIntrinsicID);
				Builder.CreateCall(Fence, FenceArgs);
				}

				/* ----- Implementation of a basic push-relabel min-cut algorithm ----- */

				Capacity FencesPRE::getResidualCapacity(Flow &Flow, Capacity C,
				FlowGraphNode &U,
				FlowGraphNode &V) const {
				return C - Flow[std::make_pair(&U, &V)];
				jfbUnsubmitted Not Done Reply Inline Actions I'd avoid the macros here and in other places. You have a `typedef` for the types, so it would be better to use `std::numeric_limits`. jfb: I'd avoid the macros here and in other places. You have a `typedef` for the types, so it would…
				morissetAuthorUnsubmitted Not Done Reply Inline Actions OK (sorry I forgot this, I remember you suggested it already). morisset: OK (sorry I forgot this, I remember you suggested it already).
				}

				bool FencesPRE::push(Buckets &B, Flow &Flow, const NodeSubset &S,
				const NodeSubset &T, Capacity C, FlowGraphNode &U,
				FlowGraphNode &V) const {
				auto amount = std::min(U.Excess, getResidualCapacity(Flow, C, U, V));
				if (amount <= 0) return false;
				Flow[std::make_pair(&U, &V)] += amount;
				Flow[std::make_pair(&V, &U)] -= amount;
				U.Excess -= amount;
				V.Excess += amount;
				DEBUG(dbgs() << "\t push " << amount << " : " << &U << " -> " << &V << "\n");
				if (S.count(&V) \|\| T.count(&V)) return true;
				if (B.size() <= V.Height) B.resize(V.Height + 1);
				// FIXME: it should probably be a set to remove duplicates for efficiency
				B[V.Height].push_back(&V);
				return true;
				}

				void FencesPRE::relabel(Flow &Flow, FlowGraphNode &U) const {
				auto minHeight = std::numeric_limits<Height>::max();
				for (auto &VWithCap : U.Succs) {
				auto &V = *VWithCap.first;
				auto C = VWithCap.second;
				if (getResidualCapacity(Flow, C, U, V) > 0)
				minHeight = std::min(minHeight, V.Height + 1);
				}
				DEBUG(dbgs() << "\trelabel " << &U << " from " << U.Height << " to "
				<< minHeight << "\n");
				assert(minHeight != std::numeric_limits<Height>::max());
				U.Height = minHeight;
				}

				Height FencesPRE::discharge(Buckets &B, Flow &Flow, const NodeSubset &S,
				const NodeSubset &T, FlowGraphNode &U) const {
				Height MaxHeight = 0;
				while (U.Excess > 0) {
				if (U.Next != U.Succs.end()) {
				auto VWithCap = *U.Next;
				auto &V = *VWithCap.first;
				auto C = VWithCap.second;
				if (U.Height > V.Height && push(B, Flow, S, T, C, U, V))
				MaxHeight = std::max(MaxHeight, V.Height);
				else
				++U.Next;
				} else {
				relabel(Flow, U);
				U.Next = U.Succs.begin();
				}
				}
				return MaxHeight;
				}

				void FencesPRE::markNode(Flow &Flow, Cut &Cut, FlowGraphNode &U) const {
				if (U.Mark) return;
				U.Mark = true;
				DEBUG(dbgs() << "Marking node " << &U << " : "; U.I->dump());
				for (auto &VWithCap : U.Succs)
				if (getResidualCapacity(Flow, VWithCap.second, U, *VWithCap.first) > 0)
				markNode(Flow, Cut, *VWithCap.first);
				else
				Cut.push_back(std::make_pair(&U, VWithCap.first));
				}

				void FencesPRE::findCut(const FlowGraph &FlowGraph, const NodeSubset &Sources,
				const NodeSubset &Sinks, Cut &Cut) const {
				Buckets Buckets;
				// The flow is a separate map and not part of the graph, because when pushing
				// along an edge U->V we must also update Flow[V->U] and not just Flow[U->V].
				Flow Flow;

				// First we must make all of the Succs vectors into actual maps.
				for (auto &InstToNode : FlowGraph) {
				auto N = InstToNode.second.get();
				do {
				std::sort(N->Succs.begin(), N->Succs.end());
				auto last = std::unique(N->Succs.begin(), N->Succs.end());
				N->Succs.erase(last, N->Succs.end());
				// And not forget to initialize the N->Next iterators.
				N->Next = N->Succs.begin();
				} while ((N = N->DupNode.get()));
				}

				// Initialization of the sources
				for (auto S : Sources) {
				S->Excess = std::numeric_limits<Capacity>::max();
				S->Height = FlowGraph.size();
				for (auto &VWithCap : S->Succs)
				push(Buckets, Flow, Sources, Sinks, VWithCap.second, S, VWithCap.first);
				}

				// We are in a trivial case with no more work to do.
				if (Buckets.size() == 0) {
				for (auto N : Sources) markNode(Flow, Cut, *N);
				return;
				}

				// signed int so we can detect when it hits -1
				int CurrentActiveHeight = 0;
				while (CurrentActiveHeight >= 0) {
				if (Buckets[CurrentActiveHeight].empty()) {
				--CurrentActiveHeight;
				continue;
				}
				FlowGraphNode &N = *Buckets[CurrentActiveHeight].back();
				Buckets[CurrentActiveHeight].pop_back();

				DEBUG(dbgs() << "Discharging node " << &N << " : "; N.I->dump());
				Height NewMaxHeight = discharge(Buckets, Flow, Sources, Sinks, N);
				CurrentActiveHeight = std::max(CurrentActiveHeight, (int)NewMaxHeight);
				}
				// We are done, just produce the cut now;
				for (auto N : Sources) markNode(Flow, Cut, *N);

				if (ViewFencesPREFlowGraph)
				ViewGraph(FlowGraphViewer(&FlowGraph, &Sources, &Sinks, &Flow), "");
				}

				/* ----- Graph pretty-printing ----- */

				// GraphTraits are used to be able to print the flow-graph for debugging.
				namespace {
				struct FGNSuccsIterator
				: std::iterator<std::forward_iterator_tag, FlowGraphNode> {
				std::vector<std::pair<FlowGraphNode *, Capacity> >::iterator InternalIterator;
				std::vector<std::pair<FlowGraphNode *, Capacity> >::iterator InternalEnd;
				FGNSuccsIterator(
				std::vector<std::pair<FlowGraphNode *, Capacity> >::iterator It,
				std::vector<std::pair<FlowGraphNode *, Capacity> >::iterator End)
				: InternalIterator(It), InternalEnd(End) {
				// Skip the first element(s) if it has a capacity of 0.
				if (It != End && It->second == 0) this->operator++();
				}
				iterator &operator++() {
				do {
				++InternalIterator;
				// We skip all the edges of capacity 0, they are added by the algorithm
				// for pushing the flow back at the end.
				} while (InternalIterator != InternalEnd && InternalIterator->second == 0);
				return *this;
				}
				iterator operator++(int n) {
				auto tmp = *this;
				++(*this);
				return tmp;
				}
				// FIXME: this should return a FlowGraphNode&, but then it does not compile.
				FlowGraphNode operator() const { return InternalIterator->first; }
				FlowGraphNode *operator->() const { return InternalIterator->first; }
				bool operator==(FGNSuccsIterator &other) {
				return other.InternalIterator == InternalIterator;
				}
				bool operator!=(FGNSuccsIterator other) { return !(*this == other); }
				};
				struct FGNodesIterator
				: std::iterator<std::forward_iterator_tag, FlowGraphNode> {
				DenseMap<Instruction *, std::unique_ptr<FlowGraphNode> >::const_iterator
				InternalIterator;
				bool IsDup;
				FGNodesIterator(DenseMap<Instruction *,
				std::unique_ptr<FlowGraphNode> >::const_iterator It)
				: InternalIterator(It), IsDup(false) {}
				FlowGraphNode *operator->() const {
				auto NodePtr = InternalIterator->second.get();
				if (IsDup)
				return NodePtr->DupNode.get();
				else
				return NodePtr;
				}
				// FIXME: this should return a FlowGraphNode&, but then it does not compile.
				FlowGraphNode operator() const { return (this->operator->()); }
				iterator &operator++() {
				if ((*this)->DupNode) {
				assert(!IsDup);
				IsDup = true;
				return *this;
				}
				IsDup = false;
				++InternalIterator;
				return *this;
				}
				iterator operator++(int n) {
				auto tmp = *this;
				++(*this);
				return tmp;
				}
				bool operator==(FGNodesIterator &other) {
				return other.InternalIterator == InternalIterator && other.IsDup == IsDup;
				}
				bool operator!=(FGNodesIterator other) { return !(*this == other); }
				};
				}
				namespace llvm {
				template <>
				struct GraphTraits<FlowGraphViewer> {
				typedef FlowGraphNode NodeType;

				typedef FGNSuccsIterator ChildIteratorType;
				static ChildIteratorType child_begin(NodeType *N) {
				return FGNSuccsIterator(N->Succs.begin(), N->Succs.end());
				}
				static ChildIteratorType child_end(NodeType *N) {
				return FGNSuccsIterator(N->Succs.end(), N->Succs.end());
				}

				typedef FGNodesIterator nodes_iterator;
				static nodes_iterator nodes_begin(const FlowGraphViewer &G) {
				return FGNodesIterator(G.InstToNodeMap->begin());
				}
				static nodes_iterator nodes_end(const FlowGraphViewer &G) {
				return FGNodesIterator(G.InstToNodeMap->end());
				}

				static FlowGraphNode *getEntryNode(const FlowGraphViewer &) {
				llvm_unreachable("getEntryNode undefined for flowgraphs");
				// They may very well have several source nodes, and even have unconnected
				// parts.
				}
				static unsigned size(FlowGraphViewer *G) {
				llvm_unreachable("Size not implemented for flowgraphs.");
				// And not trivial either, as some nodes may be duplicated, see DupNode in
				// FlowGraphNode definition.
				}
				};
				// FIXME: Nodes and their DupNode should probably be linked in the graph
				// FIXME: Different style for edges in the cut.
				template <>
				struct DOTGraphTraits<FlowGraphViewer> : public DefaultDOTGraphTraits {
				DOTGraphTraits(bool simple = false) : DefaultDOTGraphTraits(simple) {}
				bool isNodeHidden(FlowGraphNode *N) { return N->Succs.empty(); }
				static bool hasNodeAddressLabel(const void *N, const FlowGraphViewer &G) {
				return true;
				}
				std::string getNodeLabel(FlowGraphNode *N, const FlowGraphViewer &G) {
				auto I = N->I;
				std::string Str;
				raw_string_ostream OS(Str);
				if (G.InstToNodeMap->find(I)->second.get() != N) OS << "DUP: ";
				OS << *I;
				return OS.str();
				}
				static std::string getNodeAttributes(const FlowGraphNode *Node,
				const FlowGraphViewer &G) {
				// FIXME: SmallPtrSet::count() should accept const pointers
				auto N = const_cast<FlowGraphNode *>(Node);
				if (G.Sources->count(N)) return "color=green";
				if (G.Sinks->count(N)) return "color=blue";
				return "";
				}
				template <typename EdgeIter>
				static std::string getEdgeAttributes(FlowGraphNode *N, EdgeIter EI,
				const FlowGraphViewer &G) {
				auto F = G.F->lookup(std::make_pair(N, *EI));
				auto C = EI.InternalIterator->second;
				// Color inter-block edges
				if (C == (std::numeric_limits<uint64_t>::max() >> 1))
				return "color=red,style=dashed";
				return ("label=\" " + itostr(F) + "/" + itostr(C) + " \"");
				}
				};
				}

lib/Transforms/Scalar/Scalar.cpp

Show All 29 Lines	void llvm::initializeScalarOpts(PassRegistry &Registry) {
initializeADCEPass(Registry);		initializeADCEPass(Registry);
initializeAlignmentFromAssumptionsPass(Registry);		initializeAlignmentFromAssumptionsPass(Registry);
initializeSampleProfileLoaderPass(Registry);		initializeSampleProfileLoaderPass(Registry);
initializeConstantHoistingPass(Registry);		initializeConstantHoistingPass(Registry);
initializeConstantPropagationPass(Registry);		initializeConstantPropagationPass(Registry);
initializeCorrelatedValuePropagationPass(Registry);		initializeCorrelatedValuePropagationPass(Registry);
initializeDCEPass(Registry);		initializeDCEPass(Registry);
initializeDeadInstEliminationPass(Registry);		initializeDeadInstEliminationPass(Registry);
		initializeFencesPREPass(Registry);
initializeScalarizerPass(Registry);		initializeScalarizerPass(Registry);
initializeDSEPass(Registry);		initializeDSEPass(Registry);
initializeGVNPass(Registry);		initializeGVNPass(Registry);
initializeEarlyCSEPass(Registry);		initializeEarlyCSEPass(Registry);
initializeFlattenCFGPassPass(Registry);		initializeFlattenCFGPassPass(Registry);
initializeIndVarSimplifyPass(Registry);		initializeIndVarSimplifyPass(Registry);
initializeJumpThreadingPass(Registry);		initializeJumpThreadingPass(Registry);
initializeLICMPass(Registry);		initializeLICMPass(Registry);
▲ Show 20 Lines • Show All 181 Lines • Show Last 20 Lines

test/CodeGen/PowerPC/atomics-fences-pre.ll

This file was added.

				; RUN: llc < %s -mtriple=powerpc-apple-darwin -march=ppc32 -verify-machineinstrs \| FileCheck %s --check-prefix=CHECK --check-prefix=PPC32
				; FIXME: -verify-machineinstrs currently fail on ppc64 (mismatched register/instruction).
				; This is already checked for in Atomics-64.ll
				; RUN: llc < %s -mtriple=powerpc-apple-darwin -march=ppc64 \| FileCheck %s --check-prefix=CHECK --check-prefix=PPC64

				; sync 1 + sync 1 -> sync 1
				define i32 @acquire_release(i32* %mem) {
				; CHECK-LABEL: acquire_release
				; CHECK: lwz
				; CHECK: sync 1
				; CHECK-NOT: sync
				; CHECK: stw
				%val = load atomic i32* %mem acquire, align 4
				store atomic i32 42, i32* %mem release, align 4
				ret i32 %val
				}

				; sync 0 + sync 1 -> sync 0
				define i32 @acquire_seq_cst(i32* %mem) {
				; CHECK-LABEL: acquire_seq_cst
				; CHECK: lwz
				; CHECK: sync 0
				; CHECK-NOT: sync
				; CHECK: stw
				%val = load atomic i32* %mem acquire, align 4
				store atomic i32 42, i32* %mem seq_cst, align 4
				ret i32 %val
				}

				; The insertion of a fence after the monotonic store allows the elimination of
				; the fence before the release store.
				define void @basic-pre(i32* %mem, i1 %cond) {
				; CHECK-LABEL: basic-pre
				entry:
				; CHECK-LABEL: BB#0
				; CHECK: lwz
				; CHECK: sync 1
				; CHECK-NOT: sync
				; CHECK: b
				%val1 = load atomic i32* %mem acquire, align 4
				br i1 %cond, label %branch, label %exit

				branch:
				; CHECK-LABEL: BB#1
				; CHECK-NOT: sync
				; CHECK: stw
				; CHECK: sync 1
				store atomic i32 21, i32* %mem monotonic, align 4
				br label %exit

				exit:
				; CHECK-LABEL: LBB2
				; CHECK-NOT: sync
				; CHECK: stw
				store atomic i32 42, i32* %mem release, align 4
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

Add a fence elimination passNeeds RevisionPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 15211

include/llvm/InitializePasses.h

include/llvm/Transforms/Scalar.h

lib/Target/PowerPC/PPCTargetMachine.cpp

lib/Transforms/Scalar/CMakeLists.txt

lib/Transforms/Scalar/FencesPRE.cpp

lib/Transforms/Scalar/Scalar.cpp

test/CodeGen/PowerPC/atomics-fences-pre.ll

Add a fence elimination pass
Needs RevisionPublic