This is an archive of the discontinued LLVM Phabricator instance.

[RFC] Abstract parallel IR analyzes & optimizations + OpenMP implementations
Needs ReviewPublic

Authored by jdoerfert on May 23 2018, 4:16 PM.

Download Raw Diff

This revision needs review, but there are no reviewers specified.

Details

Reviewers: None

Summary

This patch is part of an RFC to add an abstract parallel IR interface
that allows us to analyze and optimize parallel codes in different
representations.

The relationship of the parts contained in this initial commit is shown
below. The attribute annotator transformation pass will query the
abstract parallel region and communication info interfaces to determine
if communicated values can be tagged as no-alias, no-capture, readnone,
or readonly. If so, this is done through the abstract ParallelIR/Builder
interface. Both the analyses parts, as well as the builder interface, is
implemented for the OpenMP KMPC runtime library call representation that
is used by clang.

      Optimization         Analysis/Transformation           Implementation
---------------------------------------------------------------------------
                     /---> ParallelRegionInfo (A) ---------|-> KMPCImpl (A)
                     |                                     |
AttributeAnnotator --|---> ParallelCommunicationInfo (A) --/
                     |
                     \---> ParallelIR/Builder (T) -----------> KMPCImpl (T)

In addition to the attribute annotator, we have four more parallel IR
specific optimizations that achieve high speedups for Rodinia OpenMP
benchmarks (see [0]). However, to keep this first commit simple, only
a simplified form of our attribute annotator was included.

[0] http://compilers.cs.uni-saarland.de/people/doerfert/par_opt18.pdf

Diff Detail

Repository

rL LLVM

Build Status

Buildable 18986
Build 18986: arc lint + arc unit

Event Timeline

jdoerfert created this revision.May 23 2018, 4:16 PM

Herald added subscribers: llvm-commits, guansong, mgorny, mehdi_amini. · View Herald TranscriptMay 23 2018, 4:16 PM

Harbormaster completed remote builds in B18538: Diff 148314.May 23 2018, 4:17 PM

jdoerfert edited the summary of this revision. (Show Details)May 23 2018, 4:18 PM

jdoerfert removed a subscriber: llvm-commits.

I've added a few specific comments, but I think that you should move forward with the associated RFC.

As a general point, I wonder how much of the logic here which recognizes kmp_* functions could be replaced with attributes/metadata on the functions themselves. If everything needed could be done this way, then it could be done, perhaps, for other runtime functions used by other frontends without optimizer modifications, and moreover, maybe we could use it for C++ lambdas "for free." I realize that this might apply to what is used for this attribute-propagation logic, although might not be true for other parallelism-aware optimization (e.g., barrier removal, region fusion, etc.). Nevertheless, it might be worthwhile even if only useful for information propagation.

include/llvm/Analysis/ParallelIR/RegionInfo.h
38	I wonder if there's any value here in using the term 'Captured' instead of 'Communicated' here. This seems very similar to the concept of captured values in C++, and so using that term might be helpful. In general, I wonder how much of this can be generalized to handle C++ lambdas.
220	Given that you have subclass ids in the parent class definition, it seems like we have a closed class hierarchy, and so we might as well use LLVM's isa/dyn_cast classof-based system (as that's more efficient than making virtual function calls when doing typeof-like testing).
lib/Transforms/ParallelIR/AttributeAnnotator.cpp
19	When you add support for the new pass manager, can you make it an CGSCC pass there?
219	I think that I understand why you're doing this, but it really deserves a comment. Also, I don't think that it gives you all of what you want. Even an identified function local could have been captured into a global, and that global could be accessed in the parallel code. If you intend to rule out aliasing via that channel, you also need to explicitly ensure that the value is not captured before the dispatching call site. You can do that with PointerMayBeCapturedBefore. The trick is that if you have multiple successive dispatch calls, you need the first one to not to capture everything, thus inhibiting the transformation for later parallel-region dispatches. I'm guessing that propagating the nocapture attribute will do this.

Thanks for these initial comments. Second revision and actual RFC mail is coming.

include/llvm/Analysis/ParallelIR/RegionInfo.h
38	I wonder if there's any value here in using the term 'Captured' instead of 'Communicated' here. This seems very similar to the concept of captured values in C++, and so using that term might be helpful. The problem I have with captured is that it is supposed to be either "by-value" or "by-reference". Neither conveys the direction information is flowing which is important here. However, I do not have a strong opinion on the naming scheme I used and if people have arguments to change any of the names I'm not opposed. In general, I wonder how much of this can be generalized to handle C++ lambdas. I think it should quite nicely. There are three reasons for this interface. First, it provides a unified API for both "outlined regions", e.g., OpenMP runtime calls, and "embedded regions", e.g., Tapir/IntelPIR. While the former provides parameter/argument attributes (readonly, writeonly, ...), we can use an analysis to determine the communication patterns for the latter too. Second, it helps to overcome the indirection that separates "outlined regions" from their original source location. This basically means this interface maps runtime library call arguments to parameters of the outlined function. Third, this interface can hide indirection through a sturct as for example used by the GOMP or pthreads library. Especially the last two points should be interesting for lambdas too if they get passed as callbacks.
220	I want to minimize the casting to subclasses of the parallel region as much as possible but since it is not always avoidable I will look into adopting the isa/dyn_cast system here.
lib/Transforms/ParallelIR/AttributeAnnotator.cpp
19	Currently, both pass managers are supported. The scheme is the same as for most of LLVM currently. Once these parallel IR passes drop support for the old pass manager, changing to a CGSCC pass should not be a problem (if I understood Chandler correctly).

Fix capture problem and small improvements

Harbormaster completed remote builds in B18985: Diff 150110.Jun 6 2018, 4:45 AM

Fix spelling and improve comments

Harbormaster completed remote builds in B18986: Diff 150111.Jun 6 2018, 4:50 AM

jdoerfert edited the summary of this revision. (Show Details)Jun 6 2018, 6:04 AM

xbolva00 added a subscriber: xbolva00.Jun 7 2018, 6:04 AM

rogfer01 added a subscriber: rogfer01.Jun 10 2018, 10:38 PM

ggeorgakoudis added a subscriber: ggeorgakoudis.Nov 10 2020, 1:46 PM

Herald added a project: Restricted Project. · View Herald TranscriptNov 10 2020, 1:46 PM

Herald added subscribers: nikic, sstefan1, bollu, yaxunl. · View Herald Transcript

Revision Contents

Path

Size

include/

llvm/

Analysis/

ParallelIR/

198 lines

348 lines

7 lines

5 lines

3 lines

Transforms/

ParallelIR.h

28 lines

ParallelIR/

AttributeAnnotator.h

30 lines

Builder.h

50 lines

lib/

Analysis/

Analysis.cpp

1 line

CMakeLists.txt

3 lines

ParallelIR/

KMPCImpl.cpp

282 lines

RegionInfo.cpp

174 lines

Passes/

LLVMBuild.txt

2 lines

PassBuilder.cpp

2 lines

PassRegistry.def

2 lines

Transforms/

CMakeLists.txt

1 line

IPO/

LLVMBuild.txt

2 lines

PassManagerBuilder.cpp

12 lines

LLVMBuild.txt

2 lines

ParallelIR/

AttributeAnnotator.cpp

388 lines

42 lines

13 lines

123 lines

22 lines

	Transforms/	ParallelIR/
		Passes/

LLVMBuild.txt

9 lines

test/

Other/

opt-O3-pipeline.ll

3 lines

Transforms/

ParallelIR/

kmpc_arg_attributes.ll

139 lines

kmpc_arg_attributes2.ll

243 lines

kmpc_noalias_arg.ll

145 lines

tools/

bugpoint/

CMakeLists.txt

1 line

LLVMBuild.txt

1 line

opt/

CMakeLists.txt

1 line

LLVMBuild.txt

1 line

opt.cpp

1 line

Diff 150111

include/llvm/Analysis/ParallelIR/KMPCImpl.h

This file was added.

				//===- ParallelIR/KMPCImpl.h - KMPC parallel region impl. -------- C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// Interface of the parallel regions for the OpenMP KMPC runtime library call
				// representation.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_ANALYSIS_PARALLELIR_KMPCIMPL_H
				#define LLVM_ANALYSIS_PARALLELIR_KMPCIMPL_H

				#include "llvm/ADT/SmallVector.h"
				#include "llvm/Analysis/ParallelIR/RegionInfo.h"
				#include "llvm/IR/Instructions.h"

				namespace llvm {

				class DominatorTree;
				class KMPC_ParallelRegion;

				/// See @p ParallelIRCommunicationInfo
				class KMPC_CommunicationInfo : public ParallelIRCommunicationInfo {

				/// Backwards reference to the parallel region for which this communication
				/// interface was created.
				const KMPC_ParallelRegion &PR;

				KMPC_CommunicationInfo(const KMPC_ParallelRegion &PR) : PR(PR) {}

				public:

				/// Return the runtime library call that initiates the communication.
				CallInst &getRTCall() const;

				/// See @p ParallelIRCommunicationInfo::getAllCommunicatingParallelRegions
				virtual bool getAllCommunicatingParallelRegions(
				SmallVectorImpl<ParallelRegion *> &CommunicatingParallelRegions)
				const override;

				/// See @p ParallelIRCommunicationInfo::getNumCommunicatedValues
				virtual unsigned getNumCommunicatedValues() const override;

				/// See @p ParallelIRCommunicationInfo::getCommunicatedValue
				virtual Value *getCommunicatedValue(unsigned Idx) const override;

				/// See @p ParallelIRCommunicationInfo::getCommunicationKind
				virtual CommunicationKind getCommunicationKind(unsigned Idx) const override;

				/// See @p ParallelIRCommunicationInfo::getCommunicatedValues(
				virtual void getCommunicatedValues(
				SmallVectorImpl<Value *> &CommunicatedValues) const override;

				/// See @p ParallelIRCommunicationInfo::getCommunicatedValueInParallelRegion
				virtual Value *
				getCommunicatedValueInParallelRegion(unsigned Idx) const override;

				/// See @p ParallelIRCommunicationInfo::hasAnnotatableCommunication
				virtual bool hasAnnotatableCommunication() const override;

				/// See @p ParallelIRCommunicationInfo::hasAttributeInParallelRegion
				virtual bool
				hasAttributeInParallelRegion(unsigned Idx,
				Attribute::AttrKind Kind) const override;

				friend class KMPC_ParallelRegion;
				};

				/// Specialization of the ParallelRegion interface for the OpenMP KMPC runtime
				/// library representation.
				///
				/// Note: This class is abstract as well. It collects the communalities between
				/// KMPC_ForkParallelRegion and KMPC_TaskParallelRegion defined below.
				///
				/// See @p ParallelRegion
				class KMPC_ParallelRegion : public ParallelRegion {

				/// The subfunction that contains the parallel code.
				Function &ParallelSubFn;

				/// The communication info object for this parallel region.
				KMPC_CommunicationInfo CommunicationInfo;

				public:
				KMPC_ParallelRegion(CallInst &KMPC_CI, Function &ParallelSubFn,
				ParallelIRRegionInfo &PRI)
				: ParallelRegion(KMPC_CI, KMPC_CI, PRI), ParallelSubFn(ParallelSubFn),
				CommunicationInfo(*this) {}

				/// Return the parallel subfunction for this parallel region.
				///
				/// Note that there might be multiple regions sharing the same parallel
				/// subfunction.
				/// See @p ParallelIRCommunicationInfo::getAllCommunicatingParallelRegions
				Function &getParallelSubFn() const { return ParallelSubFn; }

				/// See @p ParallelRegion::getFirstInsertionPoint
				virtual Instruction &getFirstInsertionPoint() const override;

				/// See @p ParallelRegion::getSequentialCodeFunction
				virtual Function &getSequentialCodeFunction() const override;

				/// See @p ParallelRegion::getParallelCodeFunction
				virtual Function &getParallelCodeFunction() const override;

				/// See @p ParallelRegion::getDefiniteBarriers
				virtual void getDefiniteBarriers(
				SmallVectorImpl<Instruction *> &DefiniteBarriers) const override;

				/// See @p ParallelRegion::getPotentialBarriers
				virtual void getPotentialBarriers(
				SmallVectorImpl<Instruction *> &PotentialBarriers) const override;

				/// See @p ParallelRegion::getThreadId
				virtual Value *getThreadId() const override;

				/// See @p ParallelRegion::getLocalThreadId
				virtual Value *getLocalThreadId() const override;

				/// See @p ParallelRegion::contains
				virtual bool contains(const BasicBlock *BB,
				const DominatorTree *) const override;

				/// See @p ParallelRegion::contains
				virtual bool contains(const Instruction *I,
				const DominatorTree *) const override;

				/// See @p ParallelRegion::visit
				virtual bool visit(InstructionVisitorTy &Visitor) const override;

				/// See @p ParallelRegion::visit
				virtual bool visit(BlockVisitorTy &Visitor) const override;

				/// See @p ParallelRegion::print
				virtual void print(raw_ostream &OS, unsigned indent) const override;

				/// See @p ParallelRegion::getCommunicationInfo
				virtual const ParallelIRCommunicationInfo &
				getCommunicationInfo() const override {
				return CommunicationInfo;
				}
				};

				/// See @p KMPC_ParallelRegion
				class KMPC_ForkParallelRegion : public KMPC_ParallelRegion {

				/// Private constructor, generation via findKMPCForkCalls.
				KMPC_ForkParallelRegion(CallInst &KMPC_ForkCI, Function &ParallelSubFn,
				ParallelIRRegionInfo &PRI);

				public:

				/// See @p ParallelRegion::getKind
				virtual ParallelRegionKind getKind() const override {
				return PRK_KMPC_FORK_RT;
				}

				/// See @p ParallelRegion::isKind
				virtual bool isKind(ParallelRegionKind Kind) const override {
				return Kind \| PRK_KMPC_FORK_RT;
				}

				/// Find all KMPC fork calls in @p M and register the KMPC_ForkParallelRegion
				/// parallel regions with @p PRI.
				static void findKMPCForkCalls(Module &M, ParallelIRRegionInfo &PRI);
				};

				/// See @p KMPC_ParallelRegion
				class KMPC_TaskParallelRegion : public KMPC_ParallelRegion {

				/// Private constructor, generation via findKMPCTaskCalls.
				KMPC_TaskParallelRegion(CallInst &KMPC_TaskCI, Function &ParallelSubFn,
				ParallelIRRegionInfo &PRI);

				public:

				/// See @p ParallelRegion::getKind
				virtual ParallelRegionKind getKind() const override {
				return PRK_KMPC_TASK_RT;
				}

				/// See @p ParallelRegion::isKind
				virtual bool isKind(ParallelRegionKind Kind) const override {
				return Kind \| PRK_KMPC_TASK_RT;
				}

				/// Find all KMPC task calls in @p M and register the KMPC_ForkParallelRegion
				/// parallel regions with @p PRI.
				static void findKMPCTaskCalls(Module &M, ParallelIRRegionInfo &PRI);
				};

				} // namespace llvm
				#endif

include/llvm/Analysis/ParallelIR/RegionInfo.h

This file was added.

				//===- ParallelIRRegionInfo.h - Parallel region analysis --------- C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// Abstract analyses interfaces to inspect parallel codes and passes to provide
				// information about parallelism inside the module.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_ANALYSIS_PARALLELIR_REGIONINFO_H
				#define LLVM_ANALYSIS_PARALLELIR_REGIONINFO_H

				#include "llvm/ADT/SetVector.h"
				#include "llvm/ADT/SmallVector.h"
				#include "llvm/IR/Dominators.h"
				#include "llvm/IR/Instructions.h"
				#include "llvm/Pass.h"

				namespace llvm {

				class ParallelRegion;
				class ParallelIRBuilder;

				/// Helper structure that decouples communication related queries from the
				/// parallel region interface.
				///
				/// Communication is conceptually divided into two parts, the sequential and the
				/// parallel one. The communicated values could be different on the other side
				/// and they are therefore only identified by their index. The way values can be
				/// communicated is either by-value or inside a container, thus through a
				/// pointer type that is dereferenced.
				struct ParallelIRCommunicationInfo {

				hfinkelUnsubmitted Not Done Reply Inline Actions I wonder if there's any value here in using the term 'Captured' instead of 'Communicated' here. This seems very similar to the concept of captured values in C++, and so using that term might be helpful. In general, I wonder how much of this can be generalized to handle C++ lambdas. hfinkel: I wonder if there's any value here in using the term 'Captured' instead of 'Communicated' here.
				jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions I wonder if there's any value here in using the term 'Captured' instead of 'Communicated' here. This seems very similar to the concept of captured values in C++, and so using that term might be helpful. The problem I have with captured is that it is supposed to be either "by-value" or "by-reference". Neither conveys the direction information is flowing which is important here. However, I do not have a strong opinion on the naming scheme I used and if people have arguments to change any of the names I'm not opposed. In general, I wonder how much of this can be generalized to handle C++ lambdas. I think it should quite nicely. There are three reasons for this interface. First, it provides a unified API for both "outlined regions", e.g., OpenMP runtime calls, and "embedded regions", e.g., Tapir/IntelPIR. While the former provides parameter/argument attributes (readonly, writeonly, ...), we can use an analysis to determine the communication patterns for the latter too. Second, it helps to overcome the indirection that separates "outlined regions" from their original source location. This basically means this interface maps runtime library call arguments to parameters of the outlined function. Third, this interface can hide indirection through a sturct as for example used by the GOMP or pthreads library. Especially the last two points should be interesting for lambdas too if they get passed as callbacks. jdoerfert: > I wonder if there's any value here in using the term 'Captured' instead of 'Communicated'…
				/// Return all parallel regions that might be involved with either side of
				/// the parallel communication interface. There might be multiple if code was
				/// reused or parts of the interface were duplicated, e.g., though unrolling.
				virtual bool getAllCommunicatingParallelRegions(
				SmallVectorImpl<ParallelRegion *> &CommunicatingParallelRegions)
				const = 0;

				/// Flags to distinguish the different kinds of communication.
				enum CommunicationKind {
				CK_VALUE, ///< communication by-value
				CK_CONTAINER_IN, ///< communication through a read-only container
				CK_CONTAINER_OUT, ///< communication through a write-only container
				CK_CONTAINER_IN_OUT, ///< communication through a container
				CK_UNKNOWN, ///< unknown/complication communication
				};

				/// Return the number of communicated values.
				virtual unsigned getNumCommunicatedValues() const = 0;

				/// Return the communicated value number @p Idx in the sequential code.
				virtual Value *getCommunicatedValue(unsigned Idx) const = 0;

				/// Return the communicated value number @p Idx in the parallel code.
				virtual Value *getCommunicatedValueInParallelRegion(unsigned Idx) const = 0;

				/// Return the kind of communication used for value number @p Idx.
				virtual CommunicationKind getCommunicationKind(unsigned Idx) const = 0;

				/// Return all communicated values in the sequential code.
				virtual void getCommunicatedValues(
				SmallVectorImpl<Value *> &CommunicatedValues) const = 0;

				/// Return true if the communication can be annotated with attributes.
				virtual bool hasAnnotatableCommunication() const = 0;

				/// Return true if the communicated value @p Idx in the parallel code has
				/// attribute @p Kind. If attribute annotation is not possible this function
				/// shall gracefully return false.
				virtual bool hasAttributeInParallelRegion(unsigned Idx,
				Attribute::AttrKind Kind) const = 0;
				};

				/// The parallel region info (PRI) identifies parallel regions and provides
				/// convenient information on them.
				///
				/// Currently the parallel region info is "lazy" in the sense that it does only
				/// need to be updated if new parallel regions are created (or deleted). As this
				/// should not happen very often (and only in very few places) it allows
				/// transformation passes to preserve the parallel region info without
				/// modifications. Additionally, it makes the analysis very lightweight in the
				/// absence of parallel regions (which should be the majority of functions).
				///
				class ParallelIRRegionInfo {
				public:

				/// Container type for parallel regions.
				using ParallelRegionContainer = SmallVector<ParallelRegion *, 4>;
				using ParallelRegionContainerMap =
				DenseMap<Function *, ParallelRegionContainer>;

				/// Iterator types for the parallel region container.
				using iterator = ParallelRegionContainerMap::iterator;
				using const_iterator = ParallelRegionContainerMap::const_iterator;

				private:

				/// The parallel regions discovered in the program.
				ParallelRegionContainerMap ParallelRegionsMap;

				/// Register the parallel region @p PR.
				void addParallelRegion(ParallelRegion &PR);

				public:
				ParallelIRRegionInfo() {}
				ParallelIRRegionInfo(Module &M) {
				recalculate(M);
				}
				~ParallelIRRegionInfo() { releaseMemory(); }

				/// Identify the parallel regions in @p M from scratch.
				void recalculate(Module &M);

				/// Return the parallel region for @p I if any.
				ParallelRegion getParallelRegionFor(Instruction I) const;

				/// Return a vector with all parallel regions in this function.
				///
				///{
				ParallelRegionContainer getParallelRegions(Function &F) const {
				return ParallelRegionsMap.lookup(&F);
				}
				///}

				/// Iterators to visit all parallel regions, function by function
				///
				///{
				iterator begin() { return ParallelRegionsMap.begin(); }
				iterator end() { return ParallelRegionsMap.end(); }

				const_iterator begin() const { return ParallelRegionsMap.begin(); }
				const_iterator end() const { return ParallelRegionsMap.end(); }
				///}

				/// Delete all memory allocated for parallel regions.
				void releaseMemory();

				/// Pretty print all parallel regions.
				///{
				void print(raw_ostream &OS) const;

				#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)
				void dump() const;
				#endif
				///}

				friend class ParallelRegion;
				friend class ParallelIRBuilder;
				};

				/// A parallel region is a single-entry, single-exit CFG region that
				/// represents code that can be executed in parallel.
				class ParallelRegion {

				/// The start point of this parallel region.
				Instruction &StartPoint;

				/// The end point of this parallel region.
				Instruction &EndPoint;

				/// The parallel region info analysis.
				ParallelIRRegionInfo &PRI;

				protected:

				ParallelRegion(Instruction &StartPoint, Instruction &EndPoint,
				ParallelIRRegionInfo &PRI)
				: StartPoint(StartPoint), EndPoint(EndPoint), PRI(PRI) {
				// Register a new parallel region always with the parallel region info.
				PRI.addParallelRegion(*this);
				}

				public:
				virtual ~ParallelRegion();

				/// Return the start point of this parallel region.
				Instruction &getStartPoint() const { return StartPoint; }

				/// Return the end point of this parallel region.
				Instruction &getEndPoint() const { return EndPoint; }

				/// Return the first instruction in this parallel region before which new code
				/// can be inserted.
				virtual Instruction &getFirstInsertionPoint() const = 0;

				/// Return the function that contains the sequential code surrounding the
				/// parallel region.
				virtual Function &getSequentialCodeFunction() const = 0;

				/// Return the function that contains the code executed in parallel.
				virtual Function &getParallelCodeFunction() const = 0;

				/// Return all definite barrier instructions.
				virtual void getDefiniteBarriers(
				SmallVectorImpl<Instruction *> &DefiniteBarriers) const = 0;

				/// Return all potential barrier instructions.
				virtual void getPotentialBarriers(
				SmallVectorImpl<Instruction *> &PotentialBarriers) const = 0;

				/// Return true if @p I might have barrier semantics for this parallel region.
				virtual bool isPotentialBarrier(Instruction &I) const;

				/// Enumeration of all known parallel region kinds.
				enum ParallelRegionKind {
				PRK_KMPC_RT = 4, ///< General KPMC runtime call
				PRK_KMPC_FORK_RT = 5, ///< KPMC fork runtime call
				PRK_KMPC_TASK_RT = 6, ///< KPMC task runtime call
				};

				/// Return the kind of this parallel region.
				virtual ParallelRegionKind getKind() const = 0;

				hfinkelUnsubmitted Not Done Reply Inline Actions Given that you have subclass ids in the parent class definition, it seems like we have a closed class hierarchy, and so we might as well use LLVM's isa/dyn_cast classof-based system (as that's more efficient than making virtual function calls when doing typeof-like testing). hfinkel: Given that you have subclass ids in the parent class definition, it seems like we have a closed…
				jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions I want to minimize the casting to subclasses of the parallel region as much as possible but since it is not always avoidable I will look into adopting the isa/dyn_cast system here. jdoerfert: I want to minimize the casting to subclasses of the parallel region as much as possible but…
				/// Return true if this parallel region is of kind @p Kind.
				virtual bool isKind(ParallelRegionKind Kind) const = 0;

				/// Return the global thread id if applicable and present.
				///
				/// Note: The ParallelIR/Builder interface allows to create a new thread id.
				virtual Value *getThreadId() const = 0;

				/// Return the local thread id if applicable and present.
				///
				/// Note: The ParallelIR/Builder interface allows to create a new thread id.
				virtual Value *getLocalThreadId() const = 0;

				/// Return a lightweight communication info object for this parallel region.
				virtual const ParallelIRCommunicationInfo &getCommunicationInfo() const = 0;

				/// Type of the instruction visitor function.
				///
				/// It will be invoked for every instruction in this parallel region until the
				/// return value of the visitor is false. Note that only proper instructions
				/// inside the parallel region are visited, thus no encoding instructions
				/// only present to mark the parallel region.
				///
				/// The return value indicates if the traversal should continue.
				using InstructionVisitorTy = std::function<bool(Instruction &)>;

				/// Type of the block visitor function.
				///
				/// It will be invoked for every basic block in this parallel region until the
				/// return value of the visitor is false. The second argument is true only if
				/// the block is not completely contained in the parallel region.
				///
				/// The return value indicates if the traversal should continue.
				using BlockVisitorTy = std::function<bool(BasicBlock &, bool /* boundary */)>;

				/// A generic visitor interface as an alternative to an iterator.
				///
				/// @returns True, if all instructions/blocks have been visited.
				///{
				virtual bool visit(InstructionVisitorTy &Visitor) const = 0;
				virtual bool visit(BlockVisitorTy &Visitor) const = 0;
				///}

				/// The contain interface is designed deliberately different from similar
				/// functions like Loop::contains(*) as it might take a dominator tree as a
				/// second argument. This allows the ParallelRegion to remain valid even if
				/// transformations change the CFG structure inside. As a consequence there
				/// are less modifications needed in the existing code base.
				///{
				virtual bool contains(const BasicBlock *BB,
				const DominatorTree *DT = nullptr) const;
				virtual bool contains(const Instruction *I,
				const DominatorTree *DT = nullptr) const;
				///}

				const ParallelIRRegionInfo &getParallelRegionInfo() const { return PRI; };

				/// Pretty print this parallel region.
				///{
				virtual void print(raw_ostream &OS, unsigned indent = 0) const = 0;

				#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)
				void dump() const;
				#endif
				///}

				friend class ParallelIRRegionInfo;
				};

				/// Pretty print the parallel region @p PR to @p OS.
				inline raw_ostream &operator<<(raw_ostream &OS, const ParallelRegion &PR) {
				PR.print(OS);
				return OS;
				}

				/// New pass manager wrapper pass around the parallel region info.
				class ParallelIRRegionAnalysis
				: public AnalysisInfoMixin<ParallelIRRegionAnalysis> {
				friend AnalysisInfoMixin<ParallelIRRegionAnalysis>;
				static AnalysisKey Key;

				public:
				typedef ParallelIRRegionInfo Result;

				/// Run the analysis pass over a module and identify the parallel regions.
				///
				/// FIXME: This does not need to be a module pass but dependent passes in the
				/// old pass manager do not work otherwise.
				ParallelIRRegionInfo run(Module &M, ModuleAnalysisManager &MAM);
				};

				/// Module pass wrapper around the parallel region info.
				///
				/// FIXME: This does not need to be a module pass but dependent passes in the
				/// old pass manager do not work otherwise.
				class ParallelIRRegionInfoPass : public ModulePass {
				ParallelIRRegionInfo PRI;

				public:
				static char ID;
				ParallelIRRegionInfoPass() : ModulePass(ID) {}

				/// Return the parallel region info analysis.
				///{
				ParallelIRRegionInfo &getParallelIRRegionInfo() { return PRI; }
				const ParallelIRRegionInfo &getParallelIRRegionInfo() const { return PRI; }
				///}

				/// Initialize the parallel region info for this function.
				bool runOnModule(Module &) override;

				/// Verify the analysis as well as some of the functions provided.
				void verifyAnalysis() const override;

				void getAnalysisUsage(AnalysisUsage &AU) const override;

				/// Pretty print the parallel regions of the function.
				///{
				void print(raw_ostream &OS, const Module *) const override;

				#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)
				void dump() const;
				#endif
				///}
				};

				} // End llvm namespace
				#endif

include/llvm/Analysis/Passes.h

Show First 20 Lines • Show All 73 Lines • ▼ Show 20 Lines	namespace llvm {

//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//
//		//
// createRegionInfoPass - This pass finds all single entry single exit regions		// createRegionInfoPass - This pass finds all single entry single exit regions
// in a function and builds the region hierarchy.		// in a function and builds the region hierarchy.
//		//
FunctionPass *createRegionInfoPass();		FunctionPass *createRegionInfoPass();

		//===--------------------------------------------------------------------===//
		//
		// createParallelRegionInfoPass - This pass finds all parallel regions
		// in a function.
		//
		ModulePass *createParallelIRRegionInfoPass();

// Print module-level debug info metadata in human-readable form.		// Print module-level debug info metadata in human-readable form.
ModulePass *createModuleDebugInfoPrinterPass();		ModulePass *createModuleDebugInfoPrinterPass();

//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//
//		//
// createMemDepPrinter - This pass exhaustively collects all memdep		// createMemDepPrinter - This pass exhaustively collects all memdep
// information and prints it with -analyze.		// information and prints it with -analyze.
//		//
Show All 20 Lines

include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 55 Lines • ▼ Show 20 Lines
	void initializeCodeGen(PassRegistry&);			void initializeCodeGen(PassRegistry&);

	/// Initialize all passes linked into the GlobalISel library.			/// Initialize all passes linked into the GlobalISel library.
	void initializeGlobalISel(PassRegistry&);			void initializeGlobalISel(PassRegistry&);

	/// Initialize all passes linked into the CodeGen library.			/// Initialize all passes linked into the CodeGen library.
	void initializeTarget(PassRegistry&);			void initializeTarget(PassRegistry&);

				/// Initialize all passes linked into the ParallelIROpts library.
				void initializeParallelIROpts(PassRegistry&);

	void initializeAAEvalLegacyPassPass(PassRegistry&);			void initializeAAEvalLegacyPassPass(PassRegistry&);
	void initializeAAResultsWrapperPassPass(PassRegistry&);			void initializeAAResultsWrapperPassPass(PassRegistry&);
	void initializeADCELegacyPassPass(PassRegistry&);			void initializeADCELegacyPassPass(PassRegistry&);
	void initializeAddDiscriminatorsLegacyPassPass(PassRegistry&);			void initializeAddDiscriminatorsLegacyPassPass(PassRegistry&);
	void initializeAddressSanitizerModulePass(PassRegistry&);			void initializeAddressSanitizerModulePass(PassRegistry&);
	void initializeAddressSanitizerPass(PassRegistry&);			void initializeAddressSanitizerPass(PassRegistry&);
	void initializeAggressiveInstCombinerLegacyPassPass(PassRegistry&);			void initializeAggressiveInstCombinerLegacyPassPass(PassRegistry&);
	void initializeAliasSetPrinterPass(PassRegistry&);			void initializeAliasSetPrinterPass(PassRegistry&);
	▲ Show 20 Lines • Show All 208 Lines • ▼ Show 20 Lines
	void initializeObjCARCAAWrapperPassPass(PassRegistry&);			void initializeObjCARCAAWrapperPassPass(PassRegistry&);
	void initializeObjCARCAPElimPass(PassRegistry&);			void initializeObjCARCAPElimPass(PassRegistry&);
	void initializeObjCARCContractPass(PassRegistry&);			void initializeObjCARCContractPass(PassRegistry&);
	void initializeObjCARCExpandPass(PassRegistry&);			void initializeObjCARCExpandPass(PassRegistry&);
	void initializeObjCARCOptPass(PassRegistry&);			void initializeObjCARCOptPass(PassRegistry&);
	void initializeOptimizationRemarkEmitterWrapperPassPass(PassRegistry&);			void initializeOptimizationRemarkEmitterWrapperPassPass(PassRegistry&);
	void initializeOptimizePHIsPass(PassRegistry&);			void initializeOptimizePHIsPass(PassRegistry&);
	void initializePAEvalPass(PassRegistry&);			void initializePAEvalPass(PassRegistry&);
				void initializeParallelIRRegionInfoPassPass(PassRegistry&);
	void initializePEIPass(PassRegistry&);			void initializePEIPass(PassRegistry&);
	void initializePGOIndirectCallPromotionLegacyPassPass(PassRegistry&);			void initializePGOIndirectCallPromotionLegacyPassPass(PassRegistry&);
	void initializePGOInstrumentationGenLegacyPassPass(PassRegistry&);			void initializePGOInstrumentationGenLegacyPassPass(PassRegistry&);
	void initializePGOInstrumentationUseLegacyPassPass(PassRegistry&);			void initializePGOInstrumentationUseLegacyPassPass(PassRegistry&);
	void initializePGOMemOPSizeOptLegacyPassPass(PassRegistry&);			void initializePGOMemOPSizeOptLegacyPassPass(PassRegistry&);
	void initializePHIEliminationPass(PassRegistry&);			void initializePHIEliminationPass(PassRegistry&);
				void initializeParallelIRAttributeAnnotatorLegacyPassPass(PassRegistry&);
	void initializePartialInlinerLegacyPassPass(PassRegistry&);			void initializePartialInlinerLegacyPassPass(PassRegistry&);
	void initializePartiallyInlineLibCallsLegacyPassPass(PassRegistry&);			void initializePartiallyInlineLibCallsLegacyPassPass(PassRegistry&);
	void initializePatchableFunctionPass(PassRegistry&);			void initializePatchableFunctionPass(PassRegistry&);
	void initializePeepholeOptimizerPass(PassRegistry&);			void initializePeepholeOptimizerPass(PassRegistry&);
	void initializePhysicalRegisterUsageInfoPass(PassRegistry&);			void initializePhysicalRegisterUsageInfoPass(PassRegistry&);
	void initializePlaceBackedgeSafepointsImplPass(PassRegistry&);			void initializePlaceBackedgeSafepointsImplPass(PassRegistry&);
	void initializePlaceSafepointsPass(PassRegistry&);			void initializePlaceSafepointsPass(PassRegistry&);
	void initializePostDomOnlyPrinterPass(PassRegistry&);			void initializePostDomOnlyPrinterPass(PassRegistry&);
	▲ Show 20 Lines • Show All 99 Lines • Show Last 20 Lines

include/llvm/LinkAllPasses.h

Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
#include "llvm/Transforms/AggressiveInstCombine/AggressiveInstCombine.h"		#include "llvm/Transforms/AggressiveInstCombine/AggressiveInstCombine.h"
#include "llvm/Transforms/IPO.h"		#include "llvm/Transforms/IPO.h"
#include "llvm/Transforms/IPO/AlwaysInliner.h"		#include "llvm/Transforms/IPO/AlwaysInliner.h"
#include "llvm/Transforms/IPO/FunctionAttrs.h"		#include "llvm/Transforms/IPO/FunctionAttrs.h"
#include "llvm/Transforms/InstCombine/InstCombine.h"		#include "llvm/Transforms/InstCombine/InstCombine.h"
#include "llvm/Transforms/Instrumentation.h"		#include "llvm/Transforms/Instrumentation.h"
#include "llvm/Transforms/Instrumentation/BoundsChecking.h"		#include "llvm/Transforms/Instrumentation/BoundsChecking.h"
#include "llvm/Transforms/ObjCARC.h"		#include "llvm/Transforms/ObjCARC.h"
		#include "llvm/Transforms/ParallelIR.h"
#include "llvm/Transforms/Scalar.h"		#include "llvm/Transforms/Scalar.h"
#include "llvm/Transforms/Scalar/GVN.h"		#include "llvm/Transforms/Scalar/GVN.h"
#include "llvm/Transforms/Utils.h"		#include "llvm/Transforms/Utils.h"
#include "llvm/Transforms/Utils/SymbolRewriter.h"		#include "llvm/Transforms/Utils/SymbolRewriter.h"
#include "llvm/Transforms/Utils/UnifyFunctionExitNodes.h"		#include "llvm/Transforms/Utils/UnifyFunctionExitNodes.h"
#include "llvm/Transforms/Vectorize.h"		#include "llvm/Transforms/Vectorize.h"
#include <cstdlib>		#include <cstdlib>

▲ Show 20 Lines • Show All 125 Lines • ▼ Show 20 Lines	ForcePassLinking() {
(void) llvm::createPostDomTree();		(void) llvm::createPostDomTree();
(void) llvm::createInstructionNamerPass();		(void) llvm::createInstructionNamerPass();
(void) llvm::createMetaRenamerPass();		(void) llvm::createMetaRenamerPass();
(void) llvm::createPostOrderFunctionAttrsLegacyPass();		(void) llvm::createPostOrderFunctionAttrsLegacyPass();
(void) llvm::createReversePostOrderFunctionAttrsPass();		(void) llvm::createReversePostOrderFunctionAttrsPass();
(void) llvm::createMergeFunctionsPass();		(void) llvm::createMergeFunctionsPass();
(void) llvm::createMergeICmpsPass();		(void) llvm::createMergeICmpsPass();
(void) llvm::createExpandMemCmpPass();		(void) llvm::createExpandMemCmpPass();
		(void) llvm::createParallelIRRegionInfoPass();
		(void) llvm::createParallelIRAttributeAnnotatorLegacyPass();
std::string buf;		std::string buf;
llvm::raw_string_ostream os(buf);		llvm::raw_string_ostream os(buf);
(void) llvm::createPrintModulePass(os);		(void) llvm::createPrintModulePass(os);
(void) llvm::createPrintFunctionPass(os);		(void) llvm::createPrintFunctionPass(os);
(void) llvm::createPrintBasicBlockPass(os);		(void) llvm::createPrintBasicBlockPass(os);
(void) llvm::createModuleDebugInfoPrinterPass();		(void) llvm::createModuleDebugInfoPrinterPass();
(void) llvm::createPartialInliningPass();		(void) llvm::createPartialInliningPass();
(void) llvm::createLintPass();		(void) llvm::createLintPass();
Show All 37 Lines

include/llvm/Transforms/ParallelIR.h

This file was added.

				//===-- ParallelIR.h - Parallel IR Transformations --------------- C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This header file defines prototypes for accessor functions that expose passes
				// in the ParallelIR transformations library.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_TRANSFORMS_PARALLEL_IR_H
				#define LLVM_TRANSFORMS_PARALLEL_IR_H

				namespace llvm {

				class ModulePass;

				//===----------------------------------------------------------------------===//
				//
				ModulePass *createParallelIRAttributeAnnotatorLegacyPass();

				} // End llvm namespace

				#endif

include/llvm/Transforms/ParallelIR/AttributeAnnotator.h

This file was added.

				//===- AttributeAnnotator.h ----- Annotate attr. from/to parallel regions -===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// TODO
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_TRANSFORMS_PARALLEL_IR_ATTRIBUTE_ANNOTATOR_H
				#define LLVM_TRANSFORMS_PARALLEL_IR_ATTRIBUTE_ANNOTATOR_H

				#include "llvm/IR/PassManager.h"

				namespace llvm {

				class Module;

				struct ParallelIRAttributeAnnotatorPass
				: PassInfoMixin<ParallelIRAttributeAnnotatorPass> {
				PreservedAnalyses run(Module &M, ModuleAnalysisManager &);
				};

				} // end namespace llvm

				#endif // LLVM_TRANSFORMS_PARALLEL_IR_REGION_MERGE_H

include/llvm/Transforms/ParallelIR/Builder.h

This file was added.

				//===- ParallelIR/Builder.h - Parallel region IR builder ---------- C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_IR_PARALLELIR_BUILDER_H
				#define LLVM_IR_PARALLELIR_BUILDER_H

				#include "llvm/Analysis/ParallelIR/RegionInfo.h"

				namespace llvm {

				/// Interface to modify and create parallel regions. The potentially different
				/// implementation of this interface for the different kinds of parallel regions
				/// will apply the modifications requested through these calls if that is
				/// possible. If not, they shall gracefully ignore the request.
				struct ParallelIRBuilder {

				/// Create a parallel IR builder for the region kind @p PRKind.
				static ParallelIRBuilder *Create(ParallelIRRegionInfo &PRI,
				ParallelRegion::ParallelRegionKind PRKind);

				/// Add the attribute @p Kind to the communicated value at index @p Idx in the
				/// sequential part of the communication interface defined by @p PRCI.
				virtual bool
				addAttributeInSequentialRegion(const ParallelIRCommunicationInfo &PRCI,
				unsigned Idx,
				Attribute::AttrKind Kind) const = 0;

				/// Add the attribute @p Kind to the communicated value at index @p Idx in the
				/// parallel part of the communication interface defined by @p PRCI.
				virtual bool
				addAttributeInParallelRegion(const ParallelIRCommunicationInfo &PRCI,
				unsigned Idx,
				Attribute::AttrKind Kind) const = 0;

				/// Add the attribute @p Kind to the communicated value at index @p Idx in
				/// both parts of the communication interface defined by @p PRCI.
				virtual bool addAttribute(const ParallelIRCommunicationInfo &PRCI,
				unsigned Idx, Attribute::AttrKind Kind) const = 0;
				};

				}
				#endif

lib/Analysis/Analysis.cpp

Show First 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	void llvm::initializeAnalysis(PassRegistry &Registry) {
initializeSCEVAAWrapperPassPass(Registry);		initializeSCEVAAWrapperPassPass(Registry);
initializeScalarEvolutionWrapperPassPass(Registry);		initializeScalarEvolutionWrapperPassPass(Registry);
initializeTargetTransformInfoWrapperPassPass(Registry);		initializeTargetTransformInfoWrapperPassPass(Registry);
initializeTypeBasedAAWrapperPassPass(Registry);		initializeTypeBasedAAWrapperPassPass(Registry);
initializeScopedNoAliasAAWrapperPassPass(Registry);		initializeScopedNoAliasAAWrapperPassPass(Registry);
initializeLCSSAVerificationPassPass(Registry);		initializeLCSSAVerificationPassPass(Registry);
initializeMemorySSAWrapperPassPass(Registry);		initializeMemorySSAWrapperPassPass(Registry);
initializeMemorySSAPrinterLegacyPassPass(Registry);		initializeMemorySSAPrinterLegacyPassPass(Registry);
		initializeParallelIRRegionInfoPassPass(Registry);
}		}

void LLVMInitializeAnalysis(LLVMPassRegistryRef R) {		void LLVMInitializeAnalysis(LLVMPassRegistryRef R) {
initializeAnalysis(*unwrap(R));		initializeAnalysis(*unwrap(R));
}		}

void LLVMInitializeIPA(LLVMPassRegistryRef R) {		void LLVMInitializeIPA(LLVMPassRegistryRef R) {
initializeAnalysis(*unwrap(R));		initializeAnalysis(*unwrap(R));
▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

lib/Analysis/CMakeLists.txt

Show First 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	add_llvm_library(LLVMAnalysis
TypeBasedAliasAnalysis.cpp		TypeBasedAliasAnalysis.cpp
TypeMetadataUtils.cpp		TypeMetadataUtils.cpp
ScopedNoAliasAA.cpp		ScopedNoAliasAA.cpp
ValueLattice.cpp		ValueLattice.cpp
ValueLatticeUtils.cpp		ValueLatticeUtils.cpp
ValueTracking.cpp		ValueTracking.cpp
VectorUtils.cpp		VectorUtils.cpp

		ParallelIR/RegionInfo.cpp
		ParallelIR/KMPCImpl.cpp

ADDITIONAL_HEADER_DIRS		ADDITIONAL_HEADER_DIRS
${LLVM_MAIN_INCLUDE_DIR}/llvm/Analysis		${LLVM_MAIN_INCLUDE_DIR}/llvm/Analysis

DEPENDS		DEPENDS
intrinsics_gen		intrinsics_gen
)		)

lib/Analysis/ParallelIR/KMPCImpl.cpp

This file was added.

				//===- KMPCImpl.cpp - OpenMP runtime (KMPC) parallel region impl. ---------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// Implementation of the parallel regions for the OpenMP KMPC runtime library
				// call representation.
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/Analysis/ParallelIR/KMPCImpl.h"

				#include "llvm/IR/Verifier.h"
				#include "llvm/Support/CommandLine.h"
				#include "llvm/Support/Debug.h"

				using namespace llvm;

				#define DEBUG_TYPE "parallel-region-info"

				//===----------------------------------------------------------------------===//
				// KMPC runtime parallel region implementation
				//

				/// Return true if @p V is a call to the function @p Name in @p F.
				static bool isCallToFunctionIn(Value V, Function CalledDecl, Function *F) {
				auto *CI = dyn_cast<CallInst>(V);
				return (CI && CI->getCalledFunction() == CalledDecl &&
				(!F \|\| CI->getFunction() == F));
				}

				/// Put all calls to @p Name in @p F into the container @p Calls as @p RetTy.
				template<typename RetTy>
				static void collectCallsToInFunction(std::string Name, Function *F, Module &M,
				SmallVectorImpl<RetTy *> &Calls) {

				// Look for the "Name" function declaration in the Module. If found, the users
				// are possible calls.
				Function *FunctionDecl = M.getFunction(Name);
				if (!FunctionDecl)
				return;

				for (User *U : FunctionDecl->users())
				if (isCallToFunctionIn(U, FunctionDecl, F))
				Calls.push_back(cast<RetTy>(U));
				}

				Instruction &KMPC_ParallelRegion::getFirstInsertionPoint() const {
				return ParallelSubFn.getEntryBlock().front();
				}

				Function &KMPC_ParallelRegion::getSequentialCodeFunction() const {
				return *getStartPoint().getFunction();
				}

				Function &KMPC_ParallelRegion::getParallelCodeFunction() const {
				return ParallelSubFn;
				}

				void KMPC_ParallelRegion::getDefiniteBarriers(
				SmallVectorImpl<Instruction *> &DefiniteBarriers) const {
				collectCallsToInFunction("__kmpc_barrier", &ParallelSubFn,
				*ParallelSubFn.getParent(), DefiniteBarriers);
				}

				void KMPC_ParallelRegion::getPotentialBarriers(
				SmallVectorImpl<Instruction *> &PotentialBarriers) const {
				InstructionVisitorTy BarrierCollector = [&](Instruction &I) {
				if (!ParallelRegion::isPotentialBarrier(I))
				return true;

				CallInst *CI = dyn_cast<CallInst>(&I);
				if (!CI)
				return true;

				if (CI->getCalledFunction()) {
				const StringRef &Name = CI->getCalledFunction()->getName();
				if (Name == "__kmpc_for_static_init_4" \|\|
				Name == "__kmpc_for_static_fini")
				return true;
				}

				PotentialBarriers.push_back(&I);
				return true;
				};

				visit(BarrierCollector);
				}

				Value *KMPC_ParallelRegion::getThreadId() const {
				Value *ThreadIdPtr = ParallelSubFn.arg_begin();

				for (Value *User : ThreadIdPtr->users())
				if (LoadInst *LI = dyn_cast<LoadInst>(User))
				return LI;

				return nullptr;
				}

				Value *KMPC_ParallelRegion::getLocalThreadId() const {
				Value *LocalThreadIdPtr = ParallelSubFn.arg_begin() + 1;

				for (Value *User : LocalThreadIdPtr->users())
				if (LoadInst *LI = dyn_cast<LoadInst>(User))
				return LI;

				return nullptr;
				}

				bool KMPC_ParallelRegion::contains(const BasicBlock *BB,
				const DominatorTree *) const {
				return BB->getParent() == &ParallelSubFn;
				}

				bool KMPC_ParallelRegion::contains(const Instruction *I,
				const DominatorTree *) const {
				return I->getFunction() == &ParallelSubFn \|\| I == &getStartPoint();
				}

				bool KMPC_ParallelRegion::visit(InstructionVisitorTy &Visitor) const {
				for (BasicBlock &BB : ParallelSubFn)
				for (Instruction &I : BB)
				if (!Visitor(I))
				return false;
				return true;
				}

				bool KMPC_ParallelRegion::visit(BlockVisitorTy &Visitor) const {
				for (BasicBlock &BB : ParallelSubFn)
				if (!Visitor(BB, false))
				return false;
				return true;
				}

				void KMPC_ParallelRegion::print(raw_ostream &OS, unsigned indent) const {
				OS.indent(indent) << "Parallel Region [" << getKind() << "]:\n";
				OS.indent(indent) << " fork call: " << getStartPoint() << "\n";
				OS.indent(indent) << " sub-function: " << ParallelSubFn.getName()
				<< "\n";

				const ParallelIRCommunicationInfo &CI = getCommunicationInfo();
				for (unsigned u = 0; u < CI.getNumCommunicatedValues(); u++)
				OS.indent(indent) << " communicated: " << *CI.getCommunicatedValue(u)
				<< " : " << CI.getCommunicationKind(u) << "\n";
				}

				//===----------------------------------------------------------------------===//
				// KMPC_TaskParallelRegion implementation
				//

				KMPC_TaskParallelRegion::KMPC_TaskParallelRegion(CallInst &KMPC_TaskCI,
				Function &ParallelSubFn,
				ParallelIRRegionInfo &PRI)
				: KMPC_ParallelRegion(KMPC_TaskCI, ParallelSubFn, PRI) {}

				void KMPC_TaskParallelRegion::findKMPCTaskCalls(
				Module &M, ParallelIRRegionInfo &PRI) {
				SmallVector<CallInst *, 8> KMPC_TaskCalls;
				collectCallsToInFunction("__kmpc_omp_task", nullptr, M, KMPC_TaskCalls);

				// Calls of the "__kmpc_omp_task" function are actually parallel regions.
				for (CallInst *CI : KMPC_TaskCalls) {

				Function *ParallelFunc =
				cast<Function>(CI->getArgOperand(2)->stripPointerCasts());
				assert(ParallelFunc);

				new KMPC_TaskParallelRegion(CI, ParallelFunc, PRI);
				}
				}

				void findKMPCTaskCalls(Module &M, ParallelIRRegionInfo &PRI) {
				KMPC_TaskParallelRegion::findKMPCTaskCalls(M, PRI);
				}

				//===----------------------------------------------------------------------===//
				// KMPC_ForkParallelRegion implementation
				//

				KMPC_ForkParallelRegion::KMPC_ForkParallelRegion(CallInst &KMPC_ForkCI,
				Function &ParallelSubFn,
				ParallelIRRegionInfo &PRI)
				: KMPC_ParallelRegion(KMPC_ForkCI, ParallelSubFn, PRI) {}

				void KMPC_ForkParallelRegion::findKMPCForkCalls(
				Module &M, ParallelIRRegionInfo &PRI) {
				SmallVector<CallInst *, 8> KMPC_ForkCalls;
				collectCallsToInFunction("__kmpc_fork_call", nullptr, M, KMPC_ForkCalls);

				// Calls of the "__kmpc_fork_call" function are actually parallel regions.
				for (CallInst *CI : KMPC_ForkCalls) {

				Function *ParallelFunc =
				cast<Function>(CI->getArgOperand(2)->stripPointerCasts());
				assert(ParallelFunc);

				new KMPC_ForkParallelRegion(CI, ParallelFunc, PRI);
				}
				}

				void findKMPCForkCalls(Module &M, ParallelIRRegionInfo &PRI) {
				KMPC_ForkParallelRegion::findKMPCForkCalls(M, PRI);
				}

				//===----------------------------------------------------------------------===//
				// CommunicationInfo implementation
				//

				CallInst &KMPC_CommunicationInfo::getRTCall() const {
				return cast<CallInst>(PR.getStartPoint());
				}

				bool KMPC_CommunicationInfo::getAllCommunicatingParallelRegions(
				SmallVectorImpl<ParallelRegion *> &CommunicatingParallelRegions) const {

				Function &ParallelFn = PR.getParallelSubFn();
				const ParallelIRRegionInfo &PRI = PR.getParallelRegionInfo();
				for (const auto &It : PRI)
				for (ParallelRegion *PR : It.getSecond())
				if (&PR->getParallelCodeFunction() == &ParallelFn)
				CommunicatingParallelRegions.push_back(PR);

				return true;
				}

				unsigned KMPC_CommunicationInfo::getNumCommunicatedValues() const {
				CallInst &CI = getRTCall();
				return CI.getNumArgOperands() - 3;
				}

				Value *KMPC_CommunicationInfo::getCommunicatedValue(unsigned Idx) const {
				CallInst &CI = getRTCall();
				return CI.getArgOperand(Idx + 3);
				}

				ParallelIRCommunicationInfo::CommunicationKind
				KMPC_CommunicationInfo::getCommunicationKind(unsigned Idx) const {
				Value *CommunicatedValue = getCommunicatedValue(Idx);
				if (!CommunicatedValue->getType()->isPointerTy())
				return CK_VALUE;

				Argument *CommunicatedValueArg =
				cast<Argument>(getCommunicatedValueInParallelRegion(Idx));
				if (!CommunicatedValueArg->hasNoCaptureAttr())
				return CK_VALUE;

				if (CommunicatedValueArg->hasAttribute(Attribute::ReadNone))
				return CK_VALUE;

				if (CommunicatedValueArg->hasAttribute(Attribute::WriteOnly))
				return CK_CONTAINER_OUT;

				if (CommunicatedValueArg->hasAttribute(Attribute::ReadOnly))
				return CK_CONTAINER_IN;

				return CK_CONTAINER_IN_OUT;
				}

				void KMPC_CommunicationInfo::getCommunicatedValues(
				SmallVectorImpl<Value *> &CommunicatedValues) const {
				CallInst &CI = getRTCall();
				for (unsigned u = 3, e = CI.getNumArgOperands(); u < e; u++)
				CommunicatedValues.push_back(CI.getArgOperand(u));
				}

				Value *KMPC_CommunicationInfo::getCommunicatedValueInParallelRegion(
				unsigned Idx) const {
				return PR.getParallelSubFn().arg_begin() + 2 + Idx;
				}

				bool KMPC_CommunicationInfo::hasAnnotatableCommunication() const {
				return PR.getKind() == ParallelRegion::PRK_KMPC_FORK_RT;
				}

				bool KMPC_CommunicationInfo::hasAttributeInParallelRegion(unsigned Idx,
				Attribute::AttrKind Kind) const {
				return cast<Argument>(getCommunicatedValueInParallelRegion(Idx))->hasAttribute(Kind);
				}

lib/Analysis/ParallelIR/RegionInfo.cpp

This file was added.

				//===- ParallelIRRegionInfo.cpp - Parallel region detection analysis
				//--------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// Implementation of the ParallelIR/RegionInfo analysis.
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/Analysis/ParallelIR/RegionInfo.h"

				#include "llvm/IR/IntrinsicInst.h"
				#include "llvm/ADT/Statistic.h"

				using namespace llvm;

				#define DEBUG_TYPE "parallel-region-info"

				STATISTIC(NumParallelRegions, "The # of parallel regions");

				//===----------------------------------------------------------------------===//
				// ParallelRegion implementation
				//

				ParallelRegion::~ParallelRegion() {}

				bool ParallelRegion::isPotentialBarrier(Instruction &I) const {

				if (!isa<CallInst>(I))
				return false;

				if (isa<IntrinsicInst>(I))
				return false;

				if (!I.mayHaveSideEffects() && !I.mayReadFromMemory())
				return false;

				return true;
				}

				bool ParallelRegion::contains(const BasicBlock *BB,
				const DominatorTree *DT) const {
				bool Contains = false;

				// Fallback to a search of all blocks in this task.
				BlockVisitorTy BBVisitor = [BB, &Contains](BasicBlock &CurrentBB,
				bool Boundary) {
				if (BB != &CurrentBB)
				return true;
				Contains = !Boundary;
				return false;
				};

				visit(BBVisitor);
				return Contains;
				}

				bool ParallelRegion::contains(const Instruction *I,
				const DominatorTree *DT) const {
				bool Contains = false;

				// Fallback to a search of all blocks in this task.
				InstructionVisitorTy InstVisitor = [I, &Contains](Instruction &CurI) {
				if (I != &CurI)
				return true;
				Contains = true;
				return false;
				};

				visit(InstVisitor);
				return Contains;
				}

				void ParallelRegion::dump() const { return print(dbgs()); }

				//===----------------------------------------------------------------------===//
				// ParallelIR/RegionInfo implementation
				//

				void ParallelIRRegionInfo::addParallelRegion(ParallelRegion &PR) {
				NumParallelRegions++;
				ParallelRegionsMap[PR.getStartPoint().getFunction()].push_back(&PR);
				}

				void ParallelIRRegionInfo::print(raw_ostream &OS) const {
				for (auto &It : ParallelRegionsMap) {
				assert(It.second.size());
				OS << "Parallel region in " << It.first->getName() << " ["
				<< It.second.size() << "]:\n";
				for (auto *PR : It.second)
				PR->print(OS);
				}
				}

				void ParallelIRRegionInfo::dump() const { print(dbgs()); }

				void ParallelIRRegionInfo::releaseMemory() {
				for (auto &It : ParallelRegionsMap)
				DeleteContainerPointers(It.second);
				ParallelRegionsMap.clear();
				}

				void findKMPCForkCalls(Module &, ParallelIRRegionInfo &);

				void ParallelIRRegionInfo::recalculate(Module &M) {
				releaseMemory();

				bool RecognizeKMPCFork = true;

				if (RecognizeKMPCFork)
				findKMPCForkCalls(M, *this);

				}

				ParallelRegion ParallelIRRegionInfo::getParallelRegionFor(Instruction I) const {
				for (ParallelRegion PR : getParallelRegions(I->getFunction()))
				if (PR->contains(I))
				return PR;
				return nullptr;
				}

				//===----------------------------------------------------------------------===//
				// ParallelRegionAnalysis implementation
				//

				AnalysisKey ParallelIRRegionAnalysis::Key;

				ParallelIRRegionInfo
				ParallelIRRegionAnalysis::run(Module &M, ModuleAnalysisManager &MAM) {
				return ParallelIRRegionInfo(M);
				}

				//===----------------------------------------------------------------------===//
				// ParallelIRRegionInfoPass implementation
				//

				bool ParallelIRRegionInfoPass::runOnModule(Module &M) {
				PRI.recalculate(M);
				return false;
				}

				void ParallelIRRegionInfoPass::getAnalysisUsage(AnalysisUsage &AU) const {
				AU.setPreservesAll();
				}

				void ParallelIRRegionInfoPass::print(raw_ostream &OS, const Module *) const {
				PRI.print(OS);
				}

				void ParallelIRRegionInfoPass::verifyAnalysis() const {
				// TODO Not implemented but merely a stub.
				}

				#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)
				void ParallelIRRegionInfoPass::dump() const { PRI.dump(); }
				#endif

				char ParallelIRRegionInfoPass::ID = 0;

				INITIALIZE_PASS_BEGIN(ParallelIRRegionInfoPass, "pir-regions",
				"Detect parallel regions", false, true)
				INITIALIZE_PASS_END(ParallelIRRegionInfoPass, "pir-regions",
				"Detect parallel regions", false, true)

				namespace llvm {
				ModulePass *createParallelIRRegionInfoPass() {
				return new ParallelIRRegionInfoPass();
				}
				} // namespace llvm

lib/Passes/LLVMBuild.txt

This file was copied to lib/Transforms/ParallelIR/LLVMBuild.txt.

	Show All 13 Lines
	; http://llvm.org/docs/LLVMBuild.html			; http://llvm.org/docs/LLVMBuild.html
	;			;
	;===------------------------------------------------------------------------===;			;===------------------------------------------------------------------------===;

	[component_0]			[component_0]
	type = Library			type = Library
	name = Passes			name = Passes
	parent = Libraries			parent = Libraries
	required_libraries = AggressiveInstCombine Analysis CodeGen Core IPO InstCombine Scalar Support Target TransformUtils Vectorize Instrumentation			required_libraries = AggressiveInstCombine Analysis CodeGen Core IPO InstCombine Scalar Support Target TransformUtils Vectorize Instrumentation ParallelIR

lib/Passes/PassBuilder.cpp

	Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
	#include "llvm/Analysis/ProfileSummaryInfo.h"			#include "llvm/Analysis/ProfileSummaryInfo.h"
	#include "llvm/Analysis/RegionInfo.h"			#include "llvm/Analysis/RegionInfo.h"
	#include "llvm/Analysis/ScalarEvolution.h"			#include "llvm/Analysis/ScalarEvolution.h"
	#include "llvm/Analysis/ScalarEvolutionAliasAnalysis.h"			#include "llvm/Analysis/ScalarEvolutionAliasAnalysis.h"
	#include "llvm/Analysis/ScopedNoAliasAA.h"			#include "llvm/Analysis/ScopedNoAliasAA.h"
	#include "llvm/Analysis/TargetLibraryInfo.h"			#include "llvm/Analysis/TargetLibraryInfo.h"
	#include "llvm/Analysis/TargetTransformInfo.h"			#include "llvm/Analysis/TargetTransformInfo.h"
	#include "llvm/Analysis/TypeBasedAliasAnalysis.h"			#include "llvm/Analysis/TypeBasedAliasAnalysis.h"
				#include "llvm/Analysis/ParallelIR/RegionInfo.h"
	#include "llvm/CodeGen/PreISelIntrinsicLowering.h"			#include "llvm/CodeGen/PreISelIntrinsicLowering.h"
	#include "llvm/CodeGen/UnreachableBlockElim.h"			#include "llvm/CodeGen/UnreachableBlockElim.h"
	#include "llvm/IR/Dominators.h"			#include "llvm/IR/Dominators.h"
	#include "llvm/IR/IRPrintingPasses.h"			#include "llvm/IR/IRPrintingPasses.h"
	#include "llvm/IR/PassManager.h"			#include "llvm/IR/PassManager.h"
	#include "llvm/IR/Verifier.h"			#include "llvm/IR/Verifier.h"
	#include "llvm/Support/Debug.h"			#include "llvm/Support/Debug.h"
	#include "llvm/Support/Regex.h"			#include "llvm/Support/Regex.h"
	▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines
	#include "llvm/Transforms/Scalar/SCCP.h"			#include "llvm/Transforms/Scalar/SCCP.h"
	#include "llvm/Transforms/Scalar/SROA.h"			#include "llvm/Transforms/Scalar/SROA.h"
	#include "llvm/Transforms/Scalar/SimpleLoopUnswitch.h"			#include "llvm/Transforms/Scalar/SimpleLoopUnswitch.h"
	#include "llvm/Transforms/Scalar/SimplifyCFG.h"			#include "llvm/Transforms/Scalar/SimplifyCFG.h"
	#include "llvm/Transforms/Scalar/Sink.h"			#include "llvm/Transforms/Scalar/Sink.h"
	#include "llvm/Transforms/Scalar/SpeculateAroundPHIs.h"			#include "llvm/Transforms/Scalar/SpeculateAroundPHIs.h"
	#include "llvm/Transforms/Scalar/SpeculativeExecution.h"			#include "llvm/Transforms/Scalar/SpeculativeExecution.h"
	#include "llvm/Transforms/Scalar/TailRecursionElimination.h"			#include "llvm/Transforms/Scalar/TailRecursionElimination.h"
				#include "llvm/Transforms/ParallelIR/AttributeAnnotator.h"
	#include "llvm/Transforms/Utils/AddDiscriminators.h"			#include "llvm/Transforms/Utils/AddDiscriminators.h"
	#include "llvm/Transforms/Utils/BreakCriticalEdges.h"			#include "llvm/Transforms/Utils/BreakCriticalEdges.h"
	#include "llvm/Transforms/Utils/EntryExitInstrumenter.h"			#include "llvm/Transforms/Utils/EntryExitInstrumenter.h"
	#include "llvm/Transforms/Utils/LCSSA.h"			#include "llvm/Transforms/Utils/LCSSA.h"
	#include "llvm/Transforms/Utils/LibCallsShrinkWrap.h"			#include "llvm/Transforms/Utils/LibCallsShrinkWrap.h"
	#include "llvm/Transforms/Utils/LoopSimplify.h"			#include "llvm/Transforms/Utils/LoopSimplify.h"
	#include "llvm/Transforms/Utils/LowerInvoke.h"			#include "llvm/Transforms/Utils/LowerInvoke.h"
	#include "llvm/Transforms/Utils/Mem2Reg.h"			#include "llvm/Transforms/Utils/Mem2Reg.h"
	▲ Show 20 Lines • Show All 1,697 Lines • Show Last 20 Lines

lib/Passes/PassRegistry.def

	Show All 20 Lines
	#endif			#endif
	MODULE_ANALYSIS("callgraph", CallGraphAnalysis())			MODULE_ANALYSIS("callgraph", CallGraphAnalysis())
	MODULE_ANALYSIS("lcg", LazyCallGraphAnalysis())			MODULE_ANALYSIS("lcg", LazyCallGraphAnalysis())
	MODULE_ANALYSIS("module-summary", ModuleSummaryIndexAnalysis())			MODULE_ANALYSIS("module-summary", ModuleSummaryIndexAnalysis())
	MODULE_ANALYSIS("no-op-module", NoOpModuleAnalysis())			MODULE_ANALYSIS("no-op-module", NoOpModuleAnalysis())
	MODULE_ANALYSIS("profile-summary", ProfileSummaryAnalysis())			MODULE_ANALYSIS("profile-summary", ProfileSummaryAnalysis())
	MODULE_ANALYSIS("targetlibinfo", TargetLibraryAnalysis())			MODULE_ANALYSIS("targetlibinfo", TargetLibraryAnalysis())
	MODULE_ANALYSIS("verify", VerifierAnalysis())			MODULE_ANALYSIS("verify", VerifierAnalysis())
				MODULE_ANALYSIS("pir-regions", ParallelIRRegionAnalysis())

	#ifndef MODULE_ALIAS_ANALYSIS			#ifndef MODULE_ALIAS_ANALYSIS
	#define MODULE_ALIAS_ANALYSIS(NAME, CREATE_PASS) \			#define MODULE_ALIAS_ANALYSIS(NAME, CREATE_PASS) \
	MODULE_ANALYSIS(NAME, CREATE_PASS)			MODULE_ANALYSIS(NAME, CREATE_PASS)
	#endif			#endif
	MODULE_ALIAS_ANALYSIS("globals-aa", GlobalsAA())			MODULE_ALIAS_ANALYSIS("globals-aa", GlobalsAA())
	#undef MODULE_ALIAS_ANALYSIS			#undef MODULE_ALIAS_ANALYSIS
	#undef MODULE_ANALYSIS			#undef MODULE_ANALYSIS
	Show All 34 Lines
	MODULE_PASS("rewrite-statepoints-for-gc", RewriteStatepointsForGC())			MODULE_PASS("rewrite-statepoints-for-gc", RewriteStatepointsForGC())
	MODULE_PASS("rewrite-symbols", RewriteSymbolPass())			MODULE_PASS("rewrite-symbols", RewriteSymbolPass())
	MODULE_PASS("rpo-functionattrs", ReversePostOrderFunctionAttrsPass())			MODULE_PASS("rpo-functionattrs", ReversePostOrderFunctionAttrsPass())
	MODULE_PASS("sample-profile", SampleProfileLoaderPass())			MODULE_PASS("sample-profile", SampleProfileLoaderPass())
	MODULE_PASS("strip-dead-prototypes", StripDeadPrototypesPass())			MODULE_PASS("strip-dead-prototypes", StripDeadPrototypesPass())
	MODULE_PASS("synthetic-counts-propagation", SyntheticCountsPropagation())			MODULE_PASS("synthetic-counts-propagation", SyntheticCountsPropagation())
	MODULE_PASS("wholeprogramdevirt", WholeProgramDevirtPass())			MODULE_PASS("wholeprogramdevirt", WholeProgramDevirtPass())
	MODULE_PASS("verify", VerifierPass())			MODULE_PASS("verify", VerifierPass())
				MODULE_PASS("pir-attribute-annotator", ParallelIRAttributeAnnotatorPass())
	#undef MODULE_PASS			#undef MODULE_PASS

	#ifndef CGSCC_ANALYSIS			#ifndef CGSCC_ANALYSIS
	#define CGSCC_ANALYSIS(NAME, CREATE_PASS)			#define CGSCC_ANALYSIS(NAME, CREATE_PASS)
	#endif			#endif
	CGSCC_ANALYSIS("no-op-cgscc", NoOpCGSCCAnalysis())			CGSCC_ANALYSIS("no-op-cgscc", NoOpCGSCCAnalysis())
	CGSCC_ANALYSIS("fam-proxy", FunctionAnalysisManagerCGSCCProxy())			CGSCC_ANALYSIS("fam-proxy", FunctionAnalysisManagerCGSCCProxy())
	#undef CGSCC_ANALYSIS			#undef CGSCC_ANALYSIS
	▲ Show 20 Lines • Show All 160 Lines • Show Last 20 Lines

lib/Transforms/CMakeLists.txt

	add_subdirectory(Utils)			add_subdirectory(Utils)
	add_subdirectory(Instrumentation)			add_subdirectory(Instrumentation)
	add_subdirectory(AggressiveInstCombine)			add_subdirectory(AggressiveInstCombine)
	add_subdirectory(InstCombine)			add_subdirectory(InstCombine)
	add_subdirectory(Scalar)			add_subdirectory(Scalar)
	add_subdirectory(IPO)			add_subdirectory(IPO)
	add_subdirectory(Vectorize)			add_subdirectory(Vectorize)
	add_subdirectory(Hello)			add_subdirectory(Hello)
	add_subdirectory(ObjCARC)			add_subdirectory(ObjCARC)
	add_subdirectory(Coroutines)			add_subdirectory(Coroutines)
				add_subdirectory(ParallelIR)

lib/Transforms/IPO/LLVMBuild.txt

	Show All 14 Lines
	;			;
	;===------------------------------------------------------------------------===;			;===------------------------------------------------------------------------===;

	[component_0]			[component_0]
	type = Library			type = Library
	name = IPO			name = IPO
	parent = Transforms			parent = Transforms
	library_name = ipo			library_name = ipo
	required_libraries = AggressiveInstCombine Analysis BitReader BitWriter Core InstCombine IRReader Linker Object ProfileData Scalar Support TransformUtils Vectorize Instrumentation			required_libraries = AggressiveInstCombine Analysis BitReader BitWriter Core InstCombine IRReader Linker Object ProfileData Scalar Support TransformUtils Vectorize Instrumentation ParallelIR

lib/Transforms/IPO/PassManagerBuilder.cpp

Show All 30 Lines
#include "llvm/Support/ManagedStatic.h"		#include "llvm/Support/ManagedStatic.h"
#include "llvm/Transforms/AggressiveInstCombine/AggressiveInstCombine.h"		#include "llvm/Transforms/AggressiveInstCombine/AggressiveInstCombine.h"
#include "llvm/Transforms/IPO.h"		#include "llvm/Transforms/IPO.h"
#include "llvm/Transforms/IPO/ForceFunctionAttrs.h"		#include "llvm/Transforms/IPO/ForceFunctionAttrs.h"
#include "llvm/Transforms/IPO/FunctionAttrs.h"		#include "llvm/Transforms/IPO/FunctionAttrs.h"
#include "llvm/Transforms/IPO/InferFunctionAttrs.h"		#include "llvm/Transforms/IPO/InferFunctionAttrs.h"
#include "llvm/Transforms/InstCombine/InstCombine.h"		#include "llvm/Transforms/InstCombine/InstCombine.h"
#include "llvm/Transforms/Instrumentation.h"		#include "llvm/Transforms/Instrumentation.h"
		#include "llvm/Transforms/ParallelIR.h"
#include "llvm/Transforms/Scalar.h"		#include "llvm/Transforms/Scalar.h"
#include "llvm/Transforms/Scalar/GVN.h"		#include "llvm/Transforms/Scalar/GVN.h"
#include "llvm/Transforms/Scalar/SimpleLoopUnswitch.h"		#include "llvm/Transforms/Scalar/SimpleLoopUnswitch.h"
#include "llvm/Transforms/Utils.h"		#include "llvm/Transforms/Utils.h"
#include "llvm/Transforms/Vectorize.h"		#include "llvm/Transforms/Vectorize.h"

using namespace llvm;		using namespace llvm;

▲ Show 20 Lines • Show All 417 Lines • ▼ Show 20 Lines	void PassManagerBuilder::populateModulePassManager(
bool PrepareForThinLTOUsingPGOSampleProfile =		bool PrepareForThinLTOUsingPGOSampleProfile =
PrepareForThinLTO && !PGOSampleUse.empty();		PrepareForThinLTO && !PGOSampleUse.empty();
if (PrepareForThinLTOUsingPGOSampleProfile)		if (PrepareForThinLTOUsingPGOSampleProfile)
DisableUnrollLoops = true;		DisableUnrollLoops = true;

// Infer attributes about declarations if possible.		// Infer attributes about declarations if possible.
MPM.add(createInferFunctionAttrsLegacyPass());		MPM.add(createInferFunctionAttrsLegacyPass());

		if (OptLevel > 2) {
		// Add parallel optimizations to the pass pipeline.
		// FIXME: This should only happen if the input contains
		// parallel constructs as we also add canonicalization
		// passes that might disturb the regular pipeline.
		// TODO: We actually only need CG SCC passes but as we rely on
		// function passes the old pass manager forces us to use
		// module passes here.
		MPM.add(createParallelIRAttributeAnnotatorLegacyPass());
		}

addExtensionsToPM(EP_ModuleOptimizerEarly, MPM);		addExtensionsToPM(EP_ModuleOptimizerEarly, MPM);

if (OptLevel > 2)		if (OptLevel > 2)
MPM.add(createCallSiteSplittingPass());		MPM.add(createCallSiteSplittingPass());

MPM.add(createIPSCCPPass()); // IP SCCP		MPM.add(createIPSCCPPass()); // IP SCCP
MPM.add(createCalledValuePropagationPass());		MPM.add(createCalledValuePropagationPass());
MPM.add(createGlobalOptimizerPass()); // Optimize out global vars		MPM.add(createGlobalOptimizerPass()); // Optimize out global vars
▲ Show 20 Lines • Show All 543 Lines • Show Last 20 Lines

lib/Transforms/LLVMBuild.txt

	Show All 10 Lines
	;			;
	; For more information on the LLVMBuild system, please see:			; For more information on the LLVMBuild system, please see:
	;			;
	; http://llvm.org/docs/LLVMBuild.html			; http://llvm.org/docs/LLVMBuild.html
	;			;
	;===------------------------------------------------------------------------===;			;===------------------------------------------------------------------------===;

	[common]			[common]
	subdirectories = AggressiveInstCombine Coroutines IPO InstCombine Instrumentation Scalar Utils Vectorize ObjCARC			subdirectories = AggressiveInstCombine Coroutines IPO InstCombine Instrumentation Scalar Utils Vectorize ObjCARC ParallelIR

	[component_0]			[component_0]
	type = Group			type = Group
	name = Transforms			name = Transforms
	parent = Libraries			parent = Libraries

lib/Transforms/ParallelIR/AttributeAnnotator.cpp

This file was added.

				//===- AttributeAnnotator.cpp -- Annotate attr. from/to parallel regions -===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// Attribute annotator for parallel regions.
				//
				// This pass tries to add attributes to the instructions representing parallel
				// regions but also to the parallel regions itself, e.g., their arguments if
				// applicable.
				//
				// TODO: This should actually be a SCC pass on the call graph. However, the old
				// pass manager doesn't allow us to use the function analyses the way we
				// want/need to so it is a module pass for now.
				//===----------------------------------------------------------------------===//
				hfinkelUnsubmitted Not Done Reply Inline Actions When you add support for the new pass manager, can you make it an CGSCC pass there? hfinkel: When you add support for the new pass manager, can you make it an CGSCC pass there?
				jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions Currently, both pass managers are supported. The scheme is the same as for most of LLVM currently. Once these parallel IR passes drop support for the old pass manager, changing to a CGSCC pass should not be a problem (if I understood Chandler correctly). jdoerfert: Currently, both pass managers are supported. The scheme is the same as for most of LLVM…

				#include "llvm/Transforms/ParallelIR/AttributeAnnotator.h"

				#include "llvm/ADT/Statistic.h"
				#include "llvm/Analysis/AliasAnalysis.h"
				#include "llvm/Analysis/AliasSetTracker.h"
				#include "llvm/Analysis/CaptureTracking.h"
				#include "llvm/Analysis/ParallelIR/RegionInfo.h"
				#include "llvm/IR/Constants.h"
				#include "llvm/IR/Dominators.h"
				#include "llvm/IR/Instructions.h"
				#include "llvm/Support/CommandLine.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/Transforms/ParallelIR.h"
				#include "llvm/Transforms/ParallelIR/Builder.h"

				#define DEBUG_TYPE "pir-attribute-annotator"

				using namespace llvm;

				STATISTIC(NumNoAliasArguments, "Number of no-alias parallel region arguments");
				STATISTIC(NumNoCaptureParameters,
				"Number of no-capture parallel region parameters");
				STATISTIC(NumReadNoneParameters,
				"Number of read-none parallel region parameters");
				STATISTIC(NumReadOnlyParameters,
				"Number of read-only parallel region parameters");
				STATISTIC(NumWriteOnlyParameters,
				"Number of write-only parallel region parameters");

				static cl::opt<bool> AnnotateAttributes("pir-annotate-attributes",
				cl::desc("Annotate attributes"),
				cl::Hidden, cl::init(true),
				cl::ZeroOrMore);

				namespace {

				// FIXME: Helper type necessary as long as the parallel IR passes are
				// implemented as module not SCC passes on the CG.
				template <typename T>
				using FuncResultProviderTy = std::function<T &(Function &F)>;

				/// Transformer that identifies merge opportunities and applies code changes.
				struct AttributeAnnotator {

				/// Constructor that accepts the parallel region info and analyses providers.
				///
				/// @param PRI The parallel region info for this module.
				/// @param AAProvider Callback to get alias information for a function.
				AttributeAnnotator(ParallelIRRegionInfo &PRI,
				FuncResultProviderTy<AliasAnalysis> &AAProvider,
				FuncResultProviderTy<DominatorTree> &DTProvider)
				: PRI(PRI), AAProvider(AAProvider), DTProvider(DTProvider) {}

				/// Run the attribute Annotator pass on the parallel regions in @p M.
				///
				/// @param M The module to run on.
				///
				/// @returns True, if any change was made, false otherwise.
				bool runOnModule(Module &M);

				private:
				/// Parameter attributes that we try to move from inside the parallel region
				/// to the outside.
				Attribute::AttrKind const ParameterAttributes[4] = {
				Attribute::NoCapture, Attribute::ReadNone, Attribute::ReadOnly,
				Attribute::WriteOnly};

				std::string const ParameterAttributeNames[4] = {"no-capture", "read-none",
				"read-only", "write-only"};

				/// Statistics to keep track of annotated parameters (see above).
				Statistic *const ParameterAttributeStatistics[4] = {
				&NumNoCaptureParameters, &NumReadNoneParameters, &NumReadOnlyParameters,
				&NumWriteOnlyParameters};

				/// Try to annotate arguments with no-alias, nocapture, etc. attributes.
				bool annotateArgumentAttributes(ParallelRegion &PR);

				/// Annotate parallel region instructions with domain knowledge.
				bool annotateParallelRepresentation(ParallelRegion &PR);

				/// The parallel region info for the module.
				ParallelIRRegionInfo &PRI;

				/// Function analyses provider callbacks.
				///
				///{
				FuncResultProviderTy<AliasAnalysis> &AAProvider;
				FuncResultProviderTy<DominatorTree> &DTProvider;
				///}
				};

				bool AttributeAnnotator::annotateParallelRepresentation(ParallelRegion &PR) {
				switch (PR.getKind()) {
				case ParallelRegion::PRK_KMPC_RT: {
				// We know the parallel runtime library call does not throw an exception.
				assert(isa<CallInst>(PR.getStartPoint()) &&
				"Expected runtime call for KMPC parallel region");
				cast<CallInst>(PR.getStartPoint()).doesNotThrow();
				return true;
				}
				default:
				return false;
				}
				}

				bool AttributeAnnotator::runOnModule(Module &M) {
				if (!AnnotateAttributes)
				return false;

				bool Changed = false;
				for (auto &It : PRI) {
				for (ParallelRegion *PR : It.second) {
				Changed \|= annotateParallelRepresentation(*PR);
				Changed \|= annotateArgumentAttributes(*PR);
				}
				}

				return Changed;
				}

				bool AttributeAnnotator::annotateArgumentAttributes(ParallelRegion &PR) {
				// Try to set no-alias, no-capture, etc. argument annotations if possible.

				LLVM_DEBUG(dbgs() << "Try to annotate argument attributes to "
				<< PR.getStartPoint() << "\n");

				const ParallelIRCommunicationInfo &PRCI = PR.getCommunicationInfo();
				if (!PRCI.hasAnnotatableCommunication()) {
				LLVM_DEBUG(dbgs() << " - Parallel region kind " << PR.getKind()
				<< " does not support communication attributes, skip!\n");
				return false;
				}

				SmallVector<ParallelRegion *, 2> CommunicatingParallelRegions;
				if (!PRCI.getAllCommunicatingParallelRegions(CommunicatingParallelRegions)) {
				LLVM_DEBUG(
				dbgs()
				<< " - Communication involves an unknown user, skip for now!\n");
				return false;
				}

				// Unrolling and inlining might have duplicated the (indirect) call sites of
				// a outlined parallel region. These situations are not supported yet as they
				// would require us to intersect the information from all call sites.
				if (CommunicatingParallelRegions.size() > 1) {
				LLVM_DEBUG(dbgs() << " - Communication involves "
				<< CommunicatingParallelRegions.size()
				<< " parallel regions, skip for now!\n");
				return false;
				}
				assert(!CommunicatingParallelRegions.empty());
				assert(CommunicatingParallelRegions.front() == &PR);

				SmallVector<Value *, 32> CommunicatedValues;
				PRCI.getCommunicatedValues(CommunicatedValues);

				Instruction &PRStartInst = PR.getStartPoint();
				DominatorTree &DT = DTProvider(*PRStartInst.getFunction());

				auto *PIRBuilder = ParallelIRBuilder::Create(PRI, PR.getKind());

				// Set of pointer parameters that might be alias free in the parallel region,
				// thus no-alias arguments.
				SmallPtrSet<Value *, 32> NoAliasCandidates;

				bool Changed = false;
				unsigned NumCommunicatedValues = CommunicatedValues.size();
				for (unsigned Idx = 0; Idx < NumCommunicatedValues; Idx++) {
				if (!CommunicatedValues[Idx])
				continue;

				if (isa<UndefValue>(CommunicatedValues[Idx]))
				continue;

				// For now we skip all non-pointer-type values.
				if (!CommunicatedValues[Idx]->getType()->isPointerTy())
				continue;

				int NumAttributes =
				sizeof(ParameterAttributes) / sizeof(ParameterAttributes[0]);

				// This is for bookkeeping purposes only.
				int NumAttributeStatistics = sizeof(ParameterAttributeStatistics) /
				sizeof(ParameterAttributeStatistics[0]);
				assert(NumAttributes <= NumAttributeStatistics &&
				"Require at least as many attribute statistics as there are "
				"attributes!\n");

				// Propagate the known argument/parameter attributes.
				for (int i = 0; i < NumAttributes; i++) {
				if (!PRCI.hasAttributeInParallelRegion(Idx, ParameterAttributes[i]))
				continue;

				if (!PIRBuilder->addAttributeInSequentialRegion(PRCI, Idx,
				ParameterAttributes[i]))
				continue;

				LLVM_DEBUG({
				hfinkelUnsubmitted Done Reply Inline Actions I think that I understand why you're doing this, but it really deserves a comment. Also, I don't think that it gives you all of what you want. Even an identified function local could have been captured into a global, and that global could be accessed in the parallel code. If you intend to rule out aliasing via that channel, you also need to explicitly ensure that the value is not captured before the dispatching call site. You can do that with PointerMayBeCapturedBefore. The trick is that if you have multiple successive dispatch calls, you need the first one to not to capture everything, thus inhibiting the transformation for later parallel-region dispatches. I'm guessing that propagating the nocapture attribute will do this. hfinkel: I think that I understand why you're doing this, but it really deserves a comment. Also, I…
				int NumAttributeNames = sizeof(ParameterAttributeNames) /
				sizeof(ParameterAttributeNames[0]);
				assert(NumAttributes == NumAttributeNames);
				dbgs() << " - Argument " << Idx << " is tagged with "
				<< ParameterAttributeNames[i] << "\n";
				});

				(*ParameterAttributeStatistics[i])++;
				Changed = true;
				}

				// After we propagated "local" parameter attributes we proceed to check
				// if this argument could be marked as no-alias. If it already is marked, or
				// if it might alias with anything, we skip it. To check for the latter, we
				// first require the argument to be identified as function local and not
				// captured up to the point of the parallel region. Later we also verify
				// they do not alias with other arguments to the parallel region.
				if (PRCI.hasAttributeInParallelRegion(Idx, Attribute::NoAlias))
				continue;
				if (!isIdentifiedFunctionLocal(CommunicatedValues[Idx]))
				continue;

				if (PointerMayBeCapturedBefore(CommunicatedValues[Idx], false, true,
				&PRStartInst, &DT))
				continue;

				NoAliasCandidates.insert(CommunicatedValues[Idx]);
				}

				if (NoAliasCandidates.empty()) {
				LLVM_DEBUG(dbgs() << " - Parallel region has no no-alias candidates.\n");
				return Changed;
				}

				SmallVector<Instruction *, 8> PotentialBarriers;
				PR.getPotentialBarriers(PotentialBarriers);

				// This initial version does not support potential (and actual) barriers as
				// the no-alias attributes would interfere with them. A way to combine
				// no-alias attributes and potential/actual barriers is the use of operand
				// bundles.
				if (!PotentialBarriers.empty()) {
				LLVM_DEBUG(dbgs() << " - Parallel region contains "
				<< PotentialBarriers.size()
				<< " potential barriers, skip for now!\n");
				return Changed;
				}

				// While all no-alias candidates do not alias with globals or pointers loaded
				// from memory they might alias with other arguments. To check this we put
				// them all in an alias set tracker and filter out singleton alias sets.
				AliasAnalysis &AA = AAProvider(*PRStartInst.getFunction());
				AliasSetTracker AST(AA);
				AAMDNodes AATags;

				for (Value *CommunicatedValue : CommunicatedValues) {
				if (!CommunicatedValue)
				continue;
				AST.add(CommunicatedValue, MemoryLocation::UnknownSize, AATags);
				}

				for (unsigned Idx = 0; Idx < NumCommunicatedValues; Idx++) {
				if (!NoAliasCandidates.count(CommunicatedValues[Idx]))
				continue;

				assert(CommunicatedValues[Idx]);
				assert(CommunicatedValues[Idx]->getType()->isPointerTy());
				const auto &AliasSet = AST.getAliasSetForPointer(
				CommunicatedValues[Idx], MemoryLocation::UnknownSize, AATags);

				// Check for singleton alias sets, thus pointers that do not alias.
				if (++AliasSet.begin() != AliasSet.end())
				continue;
				assert(AliasSet.isMustAlias());

				if (!PIRBuilder->addAttribute(PRCI, Idx, Attribute::NoAlias))
				continue;

				LLVM_DEBUG(dbgs() << " - Argument " << Idx << " is tagged as no-alias\n");

				Changed = true;
				NumNoAliasArguments++;
				}

				return Changed;
				}

				} // end anonymous namespace

				//===----------------------------------------------------------------------===//
				//
				// Pass Manager integration code
				//
				//===----------------------------------------------------------------------===//
				PreservedAnalyses
				ParallelIRAttributeAnnotatorPass::run(Module &M, ModuleAnalysisManager &MAM) {
				auto &FAM = MAM.getResult<FunctionAnalysisManagerModuleProxy>(M).getManager();
				FuncResultProviderTy<AliasAnalysis> AAProvider =
				[&](Function &F) -> AliasAnalysis & {
				return FAM.getResult<AAManager>(F);
				};
				FuncResultProviderTy<DominatorTree> DTProvider =
				[&](Function &F) -> DominatorTree & {
				return FAM.getResult<DominatorTreeAnalysis>(F);
				};

				ParallelIRRegionInfo &PRI = MAM.getResult<ParallelIRRegionAnalysis>(M);
				AttributeAnnotator PRM(PRI, AAProvider, DTProvider);
				if (PRM.runOnModule(M))
				return PreservedAnalyses::none();
				return PreservedAnalyses::all();
				}

				namespace {

				struct ParallelIRAttributeAnnotatorLegacyPass : public ModulePass {
				static char ID;

				ParallelIRAttributeAnnotatorLegacyPass() : ModulePass(ID) {
				initializeParallelIRAttributeAnnotatorLegacyPassPass(
				*PassRegistry::getPassRegistry());
				}

				bool runOnModule(Module &M) override {
				if (skipModule(M))
				return false;

				FuncResultProviderTy<DominatorTree> DTProvider =
				[&](Function &F) -> DominatorTree & {
				return getAnalysis<DominatorTreeWrapperPass>(F).getDomTree();
				};
				FuncResultProviderTy<AliasAnalysis> AAProvider =
				[&](Function &F) -> AliasAnalysis & {
				return getAnalysis<AAResultsWrapperPass>(F).getAAResults();
				};

				ParallelIRRegionInfo &PRI =
				getAnalysis<ParallelIRRegionInfoPass>().getParallelIRRegionInfo();
				AttributeAnnotator RM(PRI, AAProvider, DTProvider);

				return RM.runOnModule(M);
				}

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.addRequired<DominatorTreeWrapperPass>();
				AU.addRequired<ParallelIRRegionInfoPass>();
				AU.addRequired<AAResultsWrapperPass>();
				AU.addPreserved<ParallelIRRegionInfoPass>();
				AU.setPreservesCFG();
				}
				};

				} // end anonymous namespace

				char ParallelIRAttributeAnnotatorLegacyPass::ID = 0;

				INITIALIZE_PASS_BEGIN(ParallelIRAttributeAnnotatorLegacyPass,
				"pir-attribute-annotator",
				"Annotate attributes to parallel region", false, false)
				INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
				INITIALIZE_PASS_DEPENDENCY(ParallelIRRegionInfoPass)
				INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)
				INITIALIZE_PASS_END(ParallelIRAttributeAnnotatorLegacyPass,
				"pir-attribute-annotator",
				"Annotate attributes to parallel region", false, false)

				ModulePass *llvm::createParallelIRAttributeAnnotatorLegacyPass() {
				return new ParallelIRAttributeAnnotatorLegacyPass();
				}

lib/Transforms/ParallelIR/Builder.cpp

This file was added.

				//===- ParallelIR/Builder.cpp - Parallel region IR builder ----------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// Implementation of the general (and abstract) parallel IR builder interface.
				// The interface allows to manipulate parallel regions regardless of the
				// underlying representation by representation specific implementations in the
				// subclasses.
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/Transforms/ParallelIR/Builder.h"

				#include "llvm/Support/Debug.h"

				#define DEBUG_TYPE "pir-builder"

				using namespace llvm;

				ParallelIRBuilder *createKMPCIRBuilder(ParallelIRRegionInfo &,
				ParallelRegion::ParallelRegionKind PRKind);

				ParallelIRBuilder *
				ParallelIRBuilder::Create(ParallelIRRegionInfo &PRI,
				ParallelRegion::ParallelRegionKind PRKind) {

				switch (PRKind) {
				case ParallelRegion::PRK_KMPC_RT:
				case ParallelRegion::PRK_KMPC_TASK_RT:
				case ParallelRegion::PRK_KMPC_FORK_RT:
				return createKMPCIRBuilder(PRI, PRKind);
				default:
				break;
				}

				llvm_unreachable("No builder for chosen parallel region kind available!");
				}

lib/Transforms/ParallelIR/CMakeLists.txt

This file was added.

				add_llvm_library(LLVMParallelIROpts
				ParallelIR.cpp
				Builder.cpp
				AttributeAnnotator.cpp
				KMPCImpl.cpp

				ADDITIONAL_HEADER_DIRS
				${LLVM_MAIN_INCLUDE_DIR}/llvm/Transforms
				${LLVM_MAIN_INCLUDE_DIR}/llvm/Transforms/ParallelIR

				DEPENDS
				intrinsics_gen
				)

lib/Transforms/ParallelIR/KMPCImpl.cpp

This file was added.

				//===- KMPCImpl.cpp - Parallel IR Transformation Implementation -----------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// Implementation of the parallel IR tranformation interface for parallel
				// regions represented with KMPC (OpenMP) runtime calls.
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/Transforms/ParallelIR/Builder.h"

				#include "llvm/ADT/DenseSet.h"
				#include "llvm/ADT/Statistic.h"
				#include "llvm/Analysis/LoopInfo.h"
				#include "llvm/Analysis/ParallelIR/KMPCImpl.h"
				#include "llvm/Analysis/ParallelIR/RegionInfo.h"
				#include "llvm/Analysis/ValueTracking.h"
				#include "llvm/IR/Constants.h"
				#include "llvm/IR/Verifier.h"
				#include "llvm/Support/Debug.h"

				using namespace llvm;

				#define DEBUG_TYPE "pir-builder"

				//===----------------------------------------------------------------------===//
				// PIRBuilder specialization for the OpenMP KMPC runtime library.
				//

				struct KMPC_ParallelIRBuilder : public ParallelIRBuilder {

				KMPC_ParallelIRBuilder(ParallelIRRegionInfo &PRI,
				ParallelRegion::ParallelRegionKind PRKind)
				: PRI(PRI), PRKind(PRKind) {
				assert(PRKind == ParallelRegion::PRK_KMPC_FORK_RT \|\|
				PRKind == ParallelRegion::PRK_KMPC_TASK_RT);
				}

				/// Return the offset at which the first subfunction call argument is located.
				unsigned getFirstArgumentOffset() const;

				/// Return the offset at which the first subfunction parameter is located.
				unsigned getFirstParameterOffset() const;

				/// See @ParallelIRBuilder::addAttributeInSequentialRegion
				virtual bool
				addAttributeInSequentialRegion(const ParallelIRCommunicationInfo &PRCI,
				unsigned Idx,
				Attribute::AttrKind Kind) const override;

				/// See @ParallelIRBuilder::addAttributeInParallelRegion
				virtual bool
				addAttributeInParallelRegion(const ParallelIRCommunicationInfo &PRCI,
				unsigned Idx,
				Attribute::AttrKind Kind) const override;

				/// See @ParallelIRBuilder::addAttribute
				virtual bool addAttribute(const ParallelIRCommunicationInfo &PRCI,
				unsigned Idx,
				Attribute::AttrKind Kind) const override;

				/// The parallel region info pass.
				ParallelIRRegionInfo &PRI;

				/// The actual parallel region info kind this builder was created for.
				/// There are multiple KMPC encodings (tasks/forks) that can be distinguished
				/// this way.
				ParallelRegion::ParallelRegionKind PRKind;
				};

				//===----------------------------------------------------------------------===//
				// KMPC_ParallelIRBuilder implementation

				ParallelIRBuilder *
				createKMPCIRBuilder(ParallelIRRegionInfo &PRI,
				ParallelRegion::ParallelRegionKind PRKind) {
				return new KMPC_ParallelIRBuilder(PRI, PRKind);
				}

				unsigned KMPC_ParallelIRBuilder::getFirstParameterOffset() const {
				assert(PRKind == ParallelRegion::PRK_KMPC_FORK_RT \|\|
				PRKind == ParallelRegion::PRK_KMPC_TASK_RT);
				return PRKind == ParallelRegion::PRK_KMPC_FORK_RT ? 2 : 1;
				}

				unsigned KMPC_ParallelIRBuilder::getFirstArgumentOffset() const {
				assert(PRKind == ParallelRegion::PRK_KMPC_FORK_RT \|\|
				PRKind == ParallelRegion::PRK_KMPC_TASK_RT);
				return PRKind == ParallelRegion::PRK_KMPC_FORK_RT ? 3 : 2;
				}

				bool KMPC_ParallelIRBuilder::addAttributeInSequentialRegion(
				const ParallelIRCommunicationInfo &PRCI, unsigned Idx,
				Attribute::AttrKind Kind) const {

				const KMPC_CommunicationInfo &KMPCCI =
				static_cast<const KMPC_CommunicationInfo &>(PRCI);
				CallInst &RTCall = KMPCCI.getRTCall();
				RTCall.addParamAttr(Idx + getFirstArgumentOffset(), Kind);
				return true;
				}

				bool KMPC_ParallelIRBuilder::addAttributeInParallelRegion(
				const ParallelIRCommunicationInfo &PRCI, unsigned Idx,
				Attribute::AttrKind Kind) const {
				Argument *Arg =
				cast<Argument>(PRCI.getCommunicatedValueInParallelRegion(Idx));
				Arg->addAttr(Kind);
				return true;
				}

				bool KMPC_ParallelIRBuilder::addAttribute(
				const ParallelIRCommunicationInfo &PRCI, unsigned Idx,
				Attribute::AttrKind Kind) const {
				addAttributeInSequentialRegion(PRCI, Idx, Kind);
				addAttributeInParallelRegion(PRCI, Idx, Kind);
				return true;
				}

lib/Transforms/ParallelIR/LLVMBuild.txt

This file was copied from lib/Passes/LLVMBuild.txt.

	;===- ./lib/Passes/LLVMBuild.txt -------------------------------- Conf ---===;			;===- ./lib/Transforms/ParallelIR/LLVMBuild.txt ----------------- Conf ---===;
	;			;
	; The LLVM Compiler Infrastructure			; The LLVM Compiler Infrastructure
	;			;
	; This file is distributed under the University of Illinois Open Source			; This file is distributed under the University of Illinois Open Source
	; License. See LICENSE.TXT for details.			; License. See LICENSE.TXT for details.
	;			;
	;===------------------------------------------------------------------------===;			;===------------------------------------------------------------------------===;
	;			;
	; This is an LLVMBuild description file for the components in this subdirectory.			; This is an LLVMBuild description file for the components in this subdirectory.
	;			;
	; For more information on the LLVMBuild system, please see:			; For more information on the LLVMBuild system, please see:
	;			;
	; http://llvm.org/docs/LLVMBuild.html			; http://llvm.org/docs/LLVMBuild.html
	;			;
	;===------------------------------------------------------------------------===;			;===------------------------------------------------------------------------===;

	[component_0]			[component_0]
	type = Library			type = Library
	name = Passes			name = ParallelIR
	parent = Libraries			parent = Transforms
	required_libraries = AggressiveInstCombine Analysis CodeGen Core IPO InstCombine Scalar Support Target TransformUtils Vectorize Instrumentation			library_name = ParallelIROpts
				required_libraries = Analysis Core InstCombine Support TransformUtils

lib/Transforms/ParallelIR/ParallelIR.cpp

This file was added.

				//===-- ParallelIR.cpp ----------------------------------------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/Transforms/ParallelIR.h"

				#include "llvm/InitializePasses.h"

				using namespace llvm;

				/// initializeParallelIROptsPasses - Initialize all passes linked into the
				/// ParallelIROpts library.
				void llvm::initializeParallelIROpts(PassRegistry &Registry) {
				initializeParallelIRAttributeAnnotatorLegacyPassPass(Registry);
				}

test/Other/opt-O3-pipeline.ll

	Show All 21 Lines
	; Target Pass Configuration			; Target Pass Configuration
	; CHECK: Type-Based Alias Analysis			; CHECK: Type-Based Alias Analysis
	; CHECK-NEXT: Scoped NoAlias Alias Analysis			; CHECK-NEXT: Scoped NoAlias Alias Analysis
	; CHECK-NEXT: Assumption Cache Tracker			; CHECK-NEXT: Assumption Cache Tracker
	; CHECK-NEXT: Profile summary info			; CHECK-NEXT: Profile summary info
	; CHECK-NEXT: ModulePass Manager			; CHECK-NEXT: ModulePass Manager
	; CHECK-NEXT: Force set function attributes			; CHECK-NEXT: Force set function attributes
	; CHECK-NEXT: Infer set function attributes			; CHECK-NEXT: Infer set function attributes
				; CHECK-NEXT: Detect parallel regions
				; CHECK-NEXT: Annotate attributes to parallel region
				; CHECK-NEXT: Unnamed pass: implement Pass::getPassName()
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
	; CHECK-NEXT: Call-site splitting			; CHECK-NEXT: Call-site splitting
	; CHECK-NEXT: Interprocedural Sparse Conditional Constant Propagation			; CHECK-NEXT: Interprocedural Sparse Conditional Constant Propagation
	; CHECK-NEXT: Called Value Propagation			; CHECK-NEXT: Called Value Propagation
	; CHECK-NEXT: Global Variable Optimizer			; CHECK-NEXT: Global Variable Optimizer
	; CHECK-NEXT: Unnamed pass: implement Pass::getPassName()			; CHECK-NEXT: Unnamed pass: implement Pass::getPassName()
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	▲ Show 20 Lines • Show All 262 Lines • Show Last 20 Lines

test/Transforms/ParallelIR/kmpc_arg_attributes.ll

This file was added.

				; RUN: opt -analyze -pir-regions %s \| FileCheck %s --check-prefix=PIR_REGS
				; RUN: opt -S -pir-attribute-annotator %s \| FileCheck %s --check-prefix=PIR_ATTR

				; PIR_REGS: Parallel region in main [1]:
				; PIR_REGS-NEXT: Parallel Region [5]:
				; PIR_REGS-NEXT: fork call: call void (%ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(
				; PIR_REGS-NEXT: sub-function: .omp_outlined.
				; PIR_REGS-NEXT: communicated: %c = alloca [100 x float], align 16 : 3
				; PIR_REGS-NEXT: communicated: %a = alloca [100 x float], align 16 : 1
				; PIR_REGS-NEXT: communicated: %b = alloca [100 x float], align 16 : 1

				; The PIR ATTR check lines below will verify that %a, %b, and %c are annotated
				; with noalias, nocapture, and for the first two also readonly. %c should be
				; writeonly but it has not the appropriate argument attribute.

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				%ident_t = type { i32, i32, i32, i32, i8* }

				@.str = private unnamed_addr constant [23 x i8] c";unknown;unknown;0;0;;\00", align 1
				@0 = private unnamed_addr constant %ident_t { i32 0, i32 514, i32 0, i32 0, i8* getelementptr inbounds ([23 x i8], [23 x i8]* @.str, i32 0, i32 0) }, align 8
				@1 = private unnamed_addr constant %ident_t { i32 0, i32 2, i32 0, i32 0, i8* getelementptr inbounds ([23 x i8], [23 x i8]* @.str, i32 0, i32 0) }, align 8
				@.str.1 = private unnamed_addr constant [13 x i8] c"c[N/2] = %f\0A\00", align 1

				define i32 @main(i32 %argc, i8** nocapture readnone %argv) #0 {
				entry:
				%a = alloca [100 x float], align 16
				%b = alloca [100 x float], align 16
				%c = alloca [100 x float], align 16
				br label %for.cond

				for.cond: ; preds = %for.inc, %entry
				%i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
				%cmp = icmp slt i32 %i.0, 100
				br i1 %cmp, label %for.body, label %for.end

				for.body: ; preds = %for.cond
				%conv = sitofp i32 %i.0 to double
				%mul = fmul double %conv, 1.000000e+00
				%conv1 = fptrunc double %mul to float
				%idxprom = sext i32 %i.0 to i64
				%arrayidx = getelementptr inbounds [100 x float], [100 x float]* %b, i64 0, i64 %idxprom
				store float %conv1, float* %arrayidx, align 4
				%mul2 = fmul float 2.000000e+00, %conv1
				%idxprom3 = sext i32 %i.0 to i64
				%arrayidx4 = getelementptr inbounds [100 x float], [100 x float]* %a, i64 0, i64 %idxprom3
				store float %mul2, float* %arrayidx4, align 4
				br label %for.inc

				for.inc: ; preds = %for.body
				%inc = add nsw i32 %i.0, 1
				br label %for.cond

				for.end: ; preds = %for.cond
				; PIR_ATTR: [100 x float]* noalias nocapture %c, [100 x float]* noalias nocapture readonly %a, [100 x float]* noalias nocapture readonly %b)
				call void (%ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%ident_t* @1, i32 3, void (i32, i32, ...)* bitcast (void (i32, i32, [100 x float], [100 x float], [100 x float]) @.omp_outlined. to void (i32, i32, ...)), [100 x float] %c, [100 x float]* %a, [100 x float]* %b)
				%arrayidx5 = getelementptr inbounds [100 x float], [100 x float]* %c, i64 0, i64 50
				%0 = load float, float* %arrayidx5, align 8
				%conv6 = fpext float %0 to double
				%call = call i32 (i8, ...) @printf(i8 getelementptr inbounds ([13 x i8], [13 x i8]* @.str.1, i32 0, i32 0), double %conv6)
				ret i32 0
				}

				; PIR_ATTR: [100 x float]* noalias nocapture dereferenceable(400) %c, [100 x float]* noalias nocapture readonly dereferenceable(400) %a, [100 x float]* noalias nocapture readonly dereferenceable(400) %b) #0 {
				define internal void @.omp_outlined.(i32* noalias nocapture readonly %.global_tid., i32* noalias nocapture readnone %.bound_tid., [100 x float]* nocapture dereferenceable(400) %c, [100 x float]* nocapture readonly dereferenceable(400) %a, [100 x float]* nocapture readonly dereferenceable(400) %b) #0 {
				entry:
				%.omp.lb = alloca i32, align 4
				%.omp.ub = alloca i32, align 4
				%.omp.stride = alloca i32, align 4
				%.omp.is_last = alloca i32, align 4
				store i32 0, i32* %.omp.lb, align 4
				store i32 99, i32* %.omp.ub, align 4
				store i32 1, i32* %.omp.stride, align 4
				store i32 0, i32* %.omp.is_last, align 4
				%0 = load i32, i32* %.global_tid., align 4
				call void @__kmpc_for_static_init_4(%ident_t* @0, i32 %0, i32 34, i32* %.omp.is_last, i32* %.omp.lb, i32* %.omp.ub, i32* %.omp.stride, i32 1, i32 1)
				%1 = load i32, i32* %.omp.ub, align 4
				%cmp = icmp sgt i32 %1, 99
				br i1 %cmp, label %cond.true, label %cond.false

				cond.true: ; preds = %entry
				br label %cond.end

				cond.false: ; preds = %entry
				%2 = load i32, i32* %.omp.ub, align 4
				br label %cond.end

				cond.end: ; preds = %cond.false, %cond.true
				%cond = phi i32 [ 99, %cond.true ], [ %2, %cond.false ]
				store i32 %cond, i32* %.omp.ub, align 4
				%3 = load i32, i32* %.omp.lb, align 4
				br label %omp.inner.for.cond

				omp.inner.for.cond: ; preds = %omp.inner.for.inc, %cond.end
				%.omp.iv.0 = phi i32 [ %3, %cond.end ], [ %add7, %omp.inner.for.inc ]
				%4 = load i32, i32* %.omp.ub, align 4
				%cmp1 = icmp sle i32 %.omp.iv.0, %4
				br i1 %cmp1, label %omp.inner.for.body, label %omp.inner.for.end

				omp.inner.for.body: ; preds = %omp.inner.for.cond
				%mul = mul nsw i32 %.omp.iv.0, 1
				%add = add nsw i32 0, %mul
				%idxprom = sext i32 %add to i64
				%arrayidx = getelementptr inbounds [100 x float], [100 x float]* %a, i64 0, i64 %idxprom
				%5 = load float, float* %arrayidx, align 4
				%idxprom2 = sext i32 %add to i64
				%arrayidx3 = getelementptr inbounds [100 x float], [100 x float]* %b, i64 0, i64 %idxprom2
				%6 = load float, float* %arrayidx3, align 4
				%add4 = fadd float %5, %6
				%idxprom5 = sext i32 %add to i64
				%arrayidx6 = getelementptr inbounds [100 x float], [100 x float]* %c, i64 0, i64 %idxprom5
				store float %add4, float* %arrayidx6, align 4
				br label %omp.body.continue

				omp.body.continue: ; preds = %omp.inner.for.body
				br label %omp.inner.for.inc

				omp.inner.for.inc: ; preds = %omp.body.continue
				%add7 = add nsw i32 %.omp.iv.0, 1
				br label %omp.inner.for.cond

				omp.inner.for.end: ; preds = %omp.inner.for.cond
				br label %omp.loop.exit

				omp.loop.exit: ; preds = %omp.inner.for.end
				call void @__kmpc_for_static_fini(%ident_t* @0, i32 %0)
				ret void
				}

				declare void @__kmpc_for_static_init_4(%ident_t, i32, i32, i32, i32, i32, i32*, i32, i32)

				declare void @__kmpc_for_static_fini(%ident_t*, i32)

				declare void @__kmpc_fork_call(%ident_t, i32, void (i32, i32, ...), ...)

				declare i32 @printf(i8*, ...) #1

				attributes #0 = { noinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
				attributes #1 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

test/Transforms/ParallelIR/kmpc_arg_attributes2.ll

This file was added.

				; RUN: opt -analyze -pir-regions %s \| FileCheck %s --check-prefix=PIR_REGS
				; RUN: opt -S -pir-attribute-annotator %s \| FileCheck %s --check-prefix=PIR_ATTR
				;
				; PIR_REGS: Parallel region in main [1]:
				; PIR_REGS-NEXT: Parallel Region [5]:
				; PIR_REGS-NEXT: fork call: call void (%ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(
				; PIR_REGS-NEXT: sub-function: .omp_outlined.
				; PIR_REGS-NEXT: communicated: %c = alloca [10 x float], align 16 : 3
				; PIR_REGS-NEXT: communicated: %a = alloca [10 x float], align 16 : 1
				; PIR_REGS-NEXT: communicated: %b = alloca [10 x float], align 16 : 3

				; The PIR ATTR check lines below will verify that %a, %b, and %c are annotated
				; with nocapture, and %a with readonly. %c should be writeonly but it has not
				; the appropriate argument attribute.
				; Note: noalias is missing due to the potential barriers.

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				%ident_t = type { i32, i32, i32, i32, i8* }

				@.str = private unnamed_addr constant [23 x i8] c";unknown;unknown;0;0;;\00", align 1
				@0 = private unnamed_addr constant %ident_t { i32 0, i32 514, i32 0, i32 0, i8* getelementptr inbounds ([23 x i8], [23 x i8]* @.str, i32 0, i32 0) }, align 8
				@1 = private unnamed_addr constant %ident_t { i32 0, i32 66, i32 0, i32 0, i8* getelementptr inbounds ([23 x i8], [23 x i8]* @.str, i32 0, i32 0) }, align 8
				@2 = private unnamed_addr constant %ident_t { i32 0, i32 2, i32 0, i32 0, i8* getelementptr inbounds ([23 x i8], [23 x i8]* @.str, i32 0, i32 0) }, align 8
				@.str.1 = private unnamed_addr constant [13 x i8] c"c[N/2] = %f\0A\00", align 1

				define i32 @main(i32 %argc, i8** nocapture readnone %argv) #0 {
				entry:
				%a = alloca [10 x float], align 16
				%b = alloca [10 x float], align 16
				%c = alloca [10 x float], align 16
				br label %for.cond

				for.cond: ; preds = %for.inc, %entry
				%i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
				%cmp = icmp slt i32 %i.0, 10
				br i1 %cmp, label %for.body, label %for.end

				for.body: ; preds = %for.cond
				%conv = sitofp i32 %i.0 to double
				%mul = fmul double %conv, 1.000000e+00
				%conv1 = fptrunc double %mul to float
				%idxprom = sext i32 %i.0 to i64
				%arrayidx = getelementptr inbounds [10 x float], [10 x float]* %b, i64 0, i64 %idxprom
				store float %conv1, float* %arrayidx, align 4
				%mul2 = fmul float 2.000000e+00, %conv1
				%idxprom3 = sext i32 %i.0 to i64
				%arrayidx4 = getelementptr inbounds [10 x float], [10 x float]* %a, i64 0, i64 %idxprom3
				store float %mul2, float* %arrayidx4, align 4
				br label %for.inc

				for.inc: ; preds = %for.body
				%inc = add nsw i32 %i.0, 1
				br label %for.cond

				for.end: ; preds = %for.cond
				br label %for.cond5

				for.cond5: ; preds = %for.inc11, %for.end
				%i.1 = phi i32 [ 0, %for.end ], [ %inc12, %for.inc11 ]
				%cmp6 = icmp slt i32 %i.1, 10
				br i1 %cmp6, label %for.body8, label %for.end13

				for.body8: ; preds = %for.cond5
				%idxprom9 = sext i32 %i.1 to i64
				%arrayidx10 = getelementptr inbounds [10 x float], [10 x float]* %c, i64 0, i64 %idxprom9
				store float 0.000000e+00, float* %arrayidx10, align 4
				br label %for.inc11

				for.inc11: ; preds = %for.body8
				%inc12 = add nsw i32 %i.1, 1
				br label %for.cond5

				for.end13: ; preds = %for.cond5
				br label %for.cond14

				for.cond14: ; preds = %for.inc18, %for.end13
				%i.2 = phi i32 [ 0, %for.end13 ], [ %inc19, %for.inc18 ]
				%cmp15 = icmp slt i32 %i.2, 10
				br i1 %cmp15, label %for.body17, label %for.end20

				for.body17: ; preds = %for.cond14
				; PIR_ATTR: [10 x float]* nocapture %c, [10 x float]* nocapture readonly %a, [10 x float]* nocapture %b)
				call void (%ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%ident_t* @2, i32 3, void (i32, i32, ...)* bitcast (void (i32, i32, [10 x float], [10 x float], [10 x float]) @.omp_outlined. to void (i32, i32, ...)), [10 x float] %c, [10 x float]* %a, [10 x float]* %b)
				br label %for.inc18

				for.inc18: ; preds = %for.body17
				%inc19 = add nsw i32 %i.2, 1
				br label %for.cond14

				for.end20: ; preds = %for.cond14
				%arrayidx21 = getelementptr inbounds [10 x float], [10 x float]* %c, i64 0, i64 5
				%0 = load float, float* %arrayidx21, align 4
				%conv22 = fpext float %0 to double
				%call = call i32 (i8, ...) @printf(i8 getelementptr inbounds ([13 x i8], [13 x i8]* @.str.1, i32 0, i32 0), double %conv22)
				ret i32 0
				}

				; Note: noalias attributes are not placed due to the (potential) barriers inside the function!
				;
				; PIR_ATTR: [10 x float]* nocapture dereferenceable(40) %c, [10 x float]* nocapture readonly dereferenceable(40) %a, [10 x float]* nocapture dereferenceable(40) %b) #0 {
				define internal void @.omp_outlined.(i32* noalias nocapture readonly %.global_tid., i32* noalias nocapture readnone %.bound_tid., [10 x float]* nocapture dereferenceable(40) %c, [10 x float]* nocapture readonly dereferenceable(40) %a, [10 x float]* nocapture dereferenceable(40) %b) #0 {
				entry:
				%.omp.lb = alloca i32, align 4
				%.omp.ub = alloca i32, align 4
				%.omp.stride = alloca i32, align 4
				%.omp.is_last = alloca i32, align 4
				%.omp.lb12 = alloca i32, align 4
				%.omp.ub13 = alloca i32, align 4
				%.omp.stride14 = alloca i32, align 4
				%.omp.is_last15 = alloca i32, align 4
				store i32 0, i32* %.omp.lb, align 4
				store i32 9, i32* %.omp.ub, align 4
				store i32 1, i32* %.omp.stride, align 4
				store i32 0, i32* %.omp.is_last, align 4
				%0 = load i32, i32* %.global_tid., align 4
				call void @__kmpc_for_static_init_4(%ident_t* @0, i32 %0, i32 34, i32* %.omp.is_last, i32* %.omp.lb, i32* %.omp.ub, i32* %.omp.stride, i32 1, i32 1)
				%1 = load i32, i32* %.omp.ub, align 4
				%cmp = icmp sgt i32 %1, 9
				br i1 %cmp, label %cond.true, label %cond.false

				cond.true: ; preds = %entry
				br label %cond.end

				cond.false: ; preds = %entry
				%2 = load i32, i32* %.omp.ub, align 4
				br label %cond.end

				cond.end: ; preds = %cond.false, %cond.true
				%cond = phi i32 [ 9, %cond.true ], [ %2, %cond.false ]
				store i32 %cond, i32* %.omp.ub, align 4
				%3 = load i32, i32* %.omp.lb, align 4
				br label %omp.inner.for.cond

				omp.inner.for.cond: ; preds = %omp.inner.for.inc, %cond.end
				%.omp.iv.0 = phi i32 [ %3, %cond.end ], [ %add9, %omp.inner.for.inc ]
				%4 = load i32, i32* %.omp.ub, align 4
				%cmp2 = icmp sle i32 %.omp.iv.0, %4
				br i1 %cmp2, label %omp.inner.for.body, label %omp.inner.for.end

				omp.inner.for.body: ; preds = %omp.inner.for.cond
				%mul = mul nsw i32 %.omp.iv.0, 1
				%add = add nsw i32 0, %mul
				%idxprom = sext i32 %add to i64
				%arrayidx = getelementptr inbounds [10 x float], [10 x float]* %a, i64 0, i64 %idxprom
				%5 = load float, float* %arrayidx, align 4
				%idxprom3 = sext i32 %add to i64
				%arrayidx4 = getelementptr inbounds [10 x float], [10 x float]* %c, i64 0, i64 %idxprom3
				%6 = load float, float* %arrayidx4, align 4
				%add5 = fadd float %5, %6
				%idxprom6 = sext i32 %add to i64
				%arrayidx7 = getelementptr inbounds [10 x float], [10 x float]* %c, i64 0, i64 %idxprom6
				%7 = load float, float* %arrayidx7, align 4
				%add8 = fadd float %7, %add5
				store float %add8, float* %arrayidx7, align 4
				br label %omp.body.continue

				omp.body.continue: ; preds = %omp.inner.for.body
				br label %omp.inner.for.inc

				omp.inner.for.inc: ; preds = %omp.body.continue
				%add9 = add nsw i32 %.omp.iv.0, 1
				br label %omp.inner.for.cond

				omp.inner.for.end: ; preds = %omp.inner.for.cond
				br label %omp.loop.exit

				omp.loop.exit: ; preds = %omp.inner.for.end
				call void @__kmpc_for_static_fini(%ident_t* @0, i32 %0)
				call void @__kmpc_barrier(%ident_t* @1, i32 %0)
				store i32 0, i32* %.omp.lb12, align 4
				store i32 9, i32* %.omp.ub13, align 4
				store i32 1, i32* %.omp.stride14, align 4
				store i32 0, i32* %.omp.is_last15, align 4
				call void @__kmpc_for_static_init_4(%ident_t* @0, i32 %0, i32 34, i32* %.omp.is_last15, i32* %.omp.lb12, i32* %.omp.ub13, i32* %.omp.stride14, i32 1, i32 1)
				%8 = load i32, i32* %.omp.ub13, align 4
				%cmp17 = icmp sgt i32 %8, 9
				br i1 %cmp17, label %cond.true18, label %cond.false19

				cond.true18: ; preds = %omp.loop.exit
				br label %cond.end20

				cond.false19: ; preds = %omp.loop.exit
				%9 = load i32, i32* %.omp.ub13, align 4
				br label %cond.end20

				cond.end20: ; preds = %cond.false19, %cond.true18
				%cond21 = phi i32 [ 9, %cond.true18 ], [ %9, %cond.false19 ]
				store i32 %cond21, i32* %.omp.ub13, align 4
				%10 = load i32, i32* %.omp.lb12, align 4
				br label %omp.inner.for.cond22

				omp.inner.for.cond22: ; preds = %omp.inner.for.inc36, %cond.end20
				%.omp.iv10.0 = phi i32 [ %10, %cond.end20 ], [ %add37, %omp.inner.for.inc36 ]
				%11 = load i32, i32* %.omp.ub13, align 4
				%cmp23 = icmp sle i32 %.omp.iv10.0, %11
				br i1 %cmp23, label %omp.inner.for.body24, label %omp.inner.for.end38

				omp.inner.for.body24: ; preds = %omp.inner.for.cond22
				%mul25 = mul nsw i32 %.omp.iv10.0, 1
				%add26 = add nsw i32 0, %mul25
				%idxprom27 = sext i32 %add26 to i64
				%arrayidx28 = getelementptr inbounds [10 x float], [10 x float]* %a, i64 0, i64 %idxprom27
				%12 = load float, float* %arrayidx28, align 4
				%idxprom29 = sext i32 %add26 to i64
				%arrayidx30 = getelementptr inbounds [10 x float], [10 x float]* %b, i64 0, i64 %idxprom29
				%13 = load float, float* %arrayidx30, align 4
				%add31 = fadd float %12, %13
				%idxprom32 = sext i32 %add26 to i64
				%arrayidx33 = getelementptr inbounds [10 x float], [10 x float]* %b, i64 0, i64 %idxprom32
				%14 = load float, float* %arrayidx33, align 4
				%add34 = fadd float %14, %add31
				store float %add34, float* %arrayidx33, align 4
				br label %omp.body.continue35

				omp.body.continue35: ; preds = %omp.inner.for.body24
				br label %omp.inner.for.inc36

				omp.inner.for.inc36: ; preds = %omp.body.continue35
				%add37 = add nsw i32 %.omp.iv10.0, 1
				br label %omp.inner.for.cond22

				omp.inner.for.end38: ; preds = %omp.inner.for.cond22
				br label %omp.loop.exit39

				omp.loop.exit39: ; preds = %omp.inner.for.end38
				call void @__kmpc_for_static_fini(%ident_t* @0, i32 %0)
				call void @__kmpc_barrier(%ident_t* @1, i32 %0)
				ret void
				}

				declare void @__kmpc_for_static_init_4(%ident_t, i32, i32, i32, i32, i32, i32*, i32, i32)

				declare void @__kmpc_for_static_fini(%ident_t*, i32)

				declare void @__kmpc_barrier(%ident_t*, i32)

				declare void @__kmpc_fork_call(%ident_t, i32, void (i32, i32, ...), ...)

				declare i32 @printf(i8*, ...) #1

				attributes #0 = { noinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
				attributes #1 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

test/Transforms/ParallelIR/kmpc_noalias_arg.ll

This file was added.

				; RUN: opt -analyze -pir-regions %s \| FileCheck %s --check-prefix=PIR_REGS
				; RUN: opt -S -pir-attribute-annotator %s \| FileCheck %s --check-prefix=PIR_ATTR

				; PIR_REGS: Parallel region in main [1]:
				; PIR_REGS-NEXT: Parallel Region [5]:
				; PIR_REGS-NEXT: fork call: call void (%ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(
				; PIR_REGS-NEXT: sub-function: .omp_outlined.
				; PIR_REGS-NEXT: communicated: %c = alloca [100 x float], align 16 : 3
				; PIR_REGS-NEXT: communicated: %a = alloca [100 x float], align 16 : 1
				; PIR_REGS-NEXT: communicated: %b = alloca [100 x float], align 16 : 1

				; The PIR ATTR check lines below will verify that %c is annotated as noalias but
				; not %a and %b as they can escape prior to the parallel region.

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				%ident_t = type { i32, i32, i32, i32, i8* }

				@.str = private unnamed_addr constant [23 x i8] c";unknown;unknown;0;0;;\00", align 1
				@0 = private unnamed_addr constant %ident_t { i32 0, i32 514, i32 0, i32 0, i8* getelementptr inbounds ([23 x i8], [23 x i8]* @.str, i32 0, i32 0) }, align 8
				@1 = private unnamed_addr constant %ident_t { i32 0, i32 2, i32 0, i32 0, i8* getelementptr inbounds ([23 x i8], [23 x i8]* @.str, i32 0, i32 0) }, align 8
				@.str.1 = private unnamed_addr constant [13 x i8] c"c[N/2] = %f\0A\00", align 1

				@Capture = common global float* null, align 8
				declare void @capture([100 x float] *)

				define i32 @main(i32 %argc, i8** nocapture readnone %argv) #0 {
				entry:
				%a = alloca [100 x float], align 16
				%b = alloca [100 x float], align 16
				%c = alloca [100 x float], align 16
				br label %for.cond

				for.cond: ; preds = %for.inc, %entry
				%i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
				%cmp = icmp slt i32 %i.0, 100
				br i1 %cmp, label %for.body, label %for.end

				for.body: ; preds = %for.cond
				%acast = bitcast [100 x float]* %a to float*
				store float* %acast, float ** @Capture
				%conv = sitofp i32 %i.0 to double
				%mul = fmul double %conv, 1.000000e+00
				%conv1 = fptrunc double %mul to float
				%idxprom = sext i32 %i.0 to i64
				%arrayidx = getelementptr inbounds [100 x float], [100 x float]* %b, i64 0, i64 %idxprom
				store float %conv1, float* %arrayidx, align 4
				%mul2 = fmul float 2.000000e+00, %conv1
				%idxprom3 = sext i32 %i.0 to i64
				%arrayidx4 = getelementptr inbounds [100 x float], [100 x float]* %a, i64 0, i64 %idxprom3
				store float %mul2, float* %arrayidx4, align 4
				br label %for.inc

				for.inc: ; preds = %for.body
				%inc = add nsw i32 %i.0, 1
				br label %for.cond

				for.end: ; preds = %for.cond
				call void @capture([100 x float]* %b)
				; PIR_ATTR: call void (%ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%ident_t* @1, i32 3, void (i32, i32, ...)* bitcast (void (i32, i32, [100 x float], [100 x float], [100 x float]) @.omp_outlined. to void (i32, i32, ...)), [100 x float] noalias nocapture %c, [100 x float]* nocapture readonly %a, [100 x float]* nocapture readonly %b)
				call void (%ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%ident_t* @1, i32 3, void (i32, i32, ...)* bitcast (void (i32, i32, [100 x float], [100 x float], [100 x float]) @.omp_outlined. to void (i32, i32, ...)), [100 x float] %c, [100 x float]* %a, [100 x float]* %b)
				call void @capture([100 x float]* %c)
				%arrayidx5 = getelementptr inbounds [100 x float], [100 x float]* %c, i64 0, i64 50
				%0 = load float, float* %arrayidx5, align 8
				%conv6 = fpext float %0 to double
				%call = call i32 (i8, ...) @printf(i8 getelementptr inbounds ([13 x i8], [13 x i8]* @.str.1, i32 0, i32 0), double %conv6)
				ret i32 0
				}

				; PIR_ATTR: [100 x float]* noalias nocapture dereferenceable(400) %c, [100 x float]* nocapture readonly dereferenceable(400) %a, [100 x float]* nocapture readonly dereferenceable(400) %b) #0 {
				define internal void @.omp_outlined.(i32* noalias nocapture readonly %.global_tid., i32* noalias nocapture readnone %.bound_tid., [100 x float]* nocapture dereferenceable(400) %c, [100 x float]* nocapture readonly dereferenceable(400) %a, [100 x float]* nocapture readonly dereferenceable(400) %b) #0 {
				entry:
				%.omp.lb = alloca i32, align 4
				%.omp.ub = alloca i32, align 4
				%.omp.stride = alloca i32, align 4
				%.omp.is_last = alloca i32, align 4
				store i32 0, i32* %.omp.lb, align 4
				store i32 99, i32* %.omp.ub, align 4
				store i32 1, i32* %.omp.stride, align 4
				store i32 0, i32* %.omp.is_last, align 4
				%0 = load i32, i32* %.global_tid., align 4
				call void @__kmpc_for_static_init_4(%ident_t* @0, i32 %0, i32 34, i32* %.omp.is_last, i32* %.omp.lb, i32* %.omp.ub, i32* %.omp.stride, i32 1, i32 1)
				%1 = load i32, i32* %.omp.ub, align 4
				%cmp = icmp sgt i32 %1, 99
				br i1 %cmp, label %cond.true, label %cond.false

				cond.true: ; preds = %entry
				br label %cond.end

				cond.false: ; preds = %entry
				%2 = load i32, i32* %.omp.ub, align 4
				br label %cond.end

				cond.end: ; preds = %cond.false, %cond.true
				%cond = phi i32 [ 99, %cond.true ], [ %2, %cond.false ]
				store i32 %cond, i32* %.omp.ub, align 4
				%3 = load i32, i32* %.omp.lb, align 4
				br label %omp.inner.for.cond

				omp.inner.for.cond: ; preds = %omp.inner.for.inc, %cond.end
				%.omp.iv.0 = phi i32 [ %3, %cond.end ], [ %add7, %omp.inner.for.inc ]
				%4 = load i32, i32* %.omp.ub, align 4
				%cmp1 = icmp sle i32 %.omp.iv.0, %4
				br i1 %cmp1, label %omp.inner.for.body, label %omp.inner.for.end

				omp.inner.for.body: ; preds = %omp.inner.for.cond
				%mul = mul nsw i32 %.omp.iv.0, 1
				%add = add nsw i32 0, %mul
				%idxprom = sext i32 %add to i64
				%arrayidx = getelementptr inbounds [100 x float], [100 x float]* %a, i64 0, i64 %idxprom
				%5 = load float, float* %arrayidx, align 4
				%idxprom2 = sext i32 %add to i64
				%arrayidx3 = getelementptr inbounds [100 x float], [100 x float]* %b, i64 0, i64 %idxprom2
				%6 = load float, float* %arrayidx3, align 4
				%add4 = fadd float %5, %6
				%idxprom5 = sext i32 %add to i64
				%arrayidx6 = getelementptr inbounds [100 x float], [100 x float]* %c, i64 0, i64 %idxprom5
				store float %add4, float* %arrayidx6, align 4
				br label %omp.body.continue

				omp.body.continue: ; preds = %omp.inner.for.body
				br label %omp.inner.for.inc

				omp.inner.for.inc: ; preds = %omp.body.continue
				%add7 = add nsw i32 %.omp.iv.0, 1
				br label %omp.inner.for.cond

				omp.inner.for.end: ; preds = %omp.inner.for.cond
				br label %omp.loop.exit

				omp.loop.exit: ; preds = %omp.inner.for.end
				call void @__kmpc_for_static_fini(%ident_t* @0, i32 %0)
				ret void
				}

				declare void @__kmpc_for_static_init_4(%ident_t, i32, i32, i32, i32, i32, i32*, i32, i32)

				declare void @__kmpc_for_static_fini(%ident_t*, i32)

				declare void @__kmpc_fork_call(%ident_t, i32, void (i32, i32, ...), ...)

				declare i32 @printf(i8*, ...) #1

				attributes #0 = { noinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
				attributes #1 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

tools/bugpoint/CMakeLists.txt

	set(LLVM_LINK_COMPONENTS			set(LLVM_LINK_COMPONENTS
	${LLVM_TARGETS_TO_BUILD}			${LLVM_TARGETS_TO_BUILD}
	Analysis			Analysis
	BitWriter			BitWriter
	CodeGen			CodeGen
	Core			Core
	IPO			IPO
	IRReader			IRReader
	AggressiveInstCombine			AggressiveInstCombine
	InstCombine			InstCombine
	Instrumentation			Instrumentation
	Linker			Linker
	ObjCARCOpts			ObjCARCOpts
				ParallelIROpts
	ScalarOpts			ScalarOpts
	Support			Support
	Target			Target
	TransformUtils			TransformUtils
	Vectorize			Vectorize
	)			)

	# Support plugins.			# Support plugins.
	Show All 23 Lines

tools/bugpoint/LLVMBuild.txt

	Show All 23 Lines
	BitReader			BitReader
	BitWriter			BitWriter
	CodeGen			CodeGen
	IRReader			IRReader
	IPO			IPO
	Instrumentation			Instrumentation
	Linker			Linker
	ObjCARC			ObjCARC
				ParallelIR
	Scalar			Scalar
	all-targets			all-targets

tools/opt/CMakeLists.txt

	set(LLVM_LINK_COMPONENTS			set(LLVM_LINK_COMPONENTS
	${LLVM_TARGETS_TO_BUILD}			${LLVM_TARGETS_TO_BUILD}
	AggressiveInstCombine			AggressiveInstCombine
	Analysis			Analysis
	BitWriter			BitWriter
	CodeGen			CodeGen
	Core			Core
	Coroutines			Coroutines
	IPO			IPO
	IRReader			IRReader
	InstCombine			InstCombine
	Instrumentation			Instrumentation
	MC			MC
	ObjCARCOpts			ObjCARCOpts
				ParallelIROpts
	ScalarOpts			ScalarOpts
	Support			Support
	Target			Target
	TransformUtils			TransformUtils
	Vectorize			Vectorize
	Passes			Passes
	)			)

	Show All 21 Lines

tools/opt/LLVMBuild.txt

	Show All 21 Lines
	required_libraries =			required_libraries =
	AsmParser			AsmParser
	BitReader			BitReader
	BitWriter			BitWriter
	CodeGen			CodeGen
	IRReader			IRReader
	IPO			IPO
	Instrumentation			Instrumentation
				ParallelIR
	Scalar			Scalar
	ObjCARC			ObjCARC
	Passes			Passes
	all-targets			all-targets

tools/opt/opt.cpp

Show First 20 Lines • Show All 412 Lines • ▼ Show 20 Lines	int main(int argc, char **argv) {
InitializeAllTargetMCs();		InitializeAllTargetMCs();
InitializeAllAsmPrinters();		InitializeAllAsmPrinters();
InitializeAllAsmParsers();		InitializeAllAsmParsers();

// Initialize passes		// Initialize passes
PassRegistry &Registry = *PassRegistry::getPassRegistry();		PassRegistry &Registry = *PassRegistry::getPassRegistry();
initializeCore(Registry);		initializeCore(Registry);
initializeCoroutines(Registry);		initializeCoroutines(Registry);
		initializeParallelIROpts(Registry);
initializeScalarOpts(Registry);		initializeScalarOpts(Registry);
initializeObjCARCOpts(Registry);		initializeObjCARCOpts(Registry);
initializeVectorization(Registry);		initializeVectorization(Registry);
initializeIPO(Registry);		initializeIPO(Registry);
initializeAnalysis(Registry);		initializeAnalysis(Registry);
initializeTransformUtils(Registry);		initializeTransformUtils(Registry);
initializeInstCombine(Registry);		initializeInstCombine(Registry);
initializeAggressiveInstCombine(Registry);		initializeAggressiveInstCombine(Registry);
▲ Show 20 Lines • Show All 411 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[RFC] Abstract parallel IR analyzes & optimizations + OpenMP implementationsNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 150111

include/llvm/Analysis/ParallelIR/KMPCImpl.h

include/llvm/Analysis/ParallelIR/RegionInfo.h

include/llvm/Analysis/Passes.h

include/llvm/InitializePasses.h

include/llvm/LinkAllPasses.h

include/llvm/Transforms/ParallelIR.h

include/llvm/Transforms/ParallelIR/AttributeAnnotator.h

include/llvm/Transforms/ParallelIR/Builder.h

lib/Analysis/Analysis.cpp

lib/Analysis/CMakeLists.txt

lib/Analysis/ParallelIR/KMPCImpl.cpp

lib/Analysis/ParallelIR/RegionInfo.cpp

lib/Passes/LLVMBuild.txt

lib/Passes/PassBuilder.cpp

lib/Passes/PassRegistry.def

lib/Transforms/CMakeLists.txt

lib/Transforms/IPO/LLVMBuild.txt

lib/Transforms/IPO/PassManagerBuilder.cpp

lib/Transforms/LLVMBuild.txt

lib/Transforms/ParallelIR/AttributeAnnotator.cpp

lib/Transforms/ParallelIR/Builder.cpp

lib/Transforms/ParallelIR/CMakeLists.txt

lib/Transforms/ParallelIR/KMPCImpl.cpp

lib/Transforms/ParallelIR/LLVMBuild.txt

lib/Transforms/ParallelIR/ParallelIR.cpp

test/Other/opt-O3-pipeline.ll

test/Transforms/ParallelIR/kmpc_arg_attributes.ll

test/Transforms/ParallelIR/kmpc_arg_attributes2.ll

test/Transforms/ParallelIR/kmpc_noalias_arg.ll

tools/bugpoint/CMakeLists.txt

tools/bugpoint/LLVMBuild.txt

tools/opt/CMakeLists.txt

tools/opt/LLVMBuild.txt

tools/opt/opt.cpp

[RFC] Abstract parallel IR analyzes & optimizations + OpenMP implementations
Needs ReviewPublic