This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/lib/Transforms/IPO/
-
lib/
-
Transforms/
-
IPO/
22/28
OpenMPOpt.cpp
-
openmp/runtime/src/
-
runtime/
-
src/
1/1
kmp_sched.cpp

Differential D75384

OpenMP for loop fusion
Needs RevisionPublic

Authored by abidmalikwaterloo on Feb 28 2020, 1:35 PM.

Download Raw Diff

Details

Reviewers

jdoerfert

Summary

The patch combine two openmp for loop with static scheduling strategy having the same parameters.

Diff Detail

Event Timeline

abidmalikwaterloo created this revision.Feb 28 2020, 1:35 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 28 2020, 1:35 PM

Herald added subscribers: llvm-commits, guansong, hiraditya. · View Herald Transcript

Harbormaster failed remote builds in B47638: Diff 247361!Feb 28 2020, 1:51 PM

This patch *does not* perform loop fusion. Nor do we want it to. We want to deduplicate OpenMP runtime calls that are involved in OpenMP worksharing loops. Please make this clear in the commit message and description. The latter should go into more detail what is happening here and why.

There are various problems wrt. the coding standards. I will only comment on a few but the patch needs to be updated according to the coding standard and the surrounding code patterns.

Why are there no tests?

The code is complicated and hard to read, especially given the lack of comments and documentation. Please read through my inline comments. Afterwards, I suggest you start by

identifying call sites as the surrounding code does it, then
filter call sites that are not legible on their own, then
for each call site, find a matching one, if found, replace and rewrite and try again.

You should have test cases that are developed as part of the patch. You should also put early versions for review so you can get feedback.

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
130	As mentioned in the main message, this does not perform loop fusion. It is a more complex deduplication of runtime calls that may enable loop fusion but for the sake of this patch and this code it is just runtime call deduplication.
171	While I am pretty sure the entire map construction is overly complicated and should be replaced, you can achieve the above with something like: for (const auto &It : m) if (llvm::any_of(It.second, [I](CallInst *CI) { return I == CI;})) return true; return false;
224	Why does it always return false?
230	Please use modern C++. LLVM is currently based on C++14, for years we allow range based loops: `for (Instruction *I : instructions(F)) {`
233	Please look take a look at `deduplicateRuntimeCalls` and `deleteParallelRegions`. Their we utilize a caching system to find calls to OpenMP runtime functions. It is important that we do not cause a constant overhead in this pass by scanning instructions multiple times.
242	Since you are interested in special stores only it would make sense to look for them explicitly, e.g., by following the uses of arguments passed into the `__kmpc_for_static_init_4` call.
248	Most of these functions lack documentation and all of them lack comments explaining the code. The names need to be adjusted wrt. the coding standard and to make them expressive. For example, "clean" is nothing I can associate with an action.
251	We cannot have stray debug output.

abidmalikwaterloo updated this revision to Diff 250112.Mar 12 2020, 6:05 PM

abidmalikwaterloo marked an inline comment as done.

You need to squash your commits or create the diff against the proper base commit. More comments inlined but there is a lot more to comment on.

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
151	The example above is generally nice but maybe we should not print values in parallel, just initialize two arrays. Style: Clang format all code and comments (=the entire file). Also adjust the indention in the example.
153	Nit: two spaces, if you write sentences add the '.'.
166	Changed should never be set to false once it is true. However, only because you do that this works. You need a local changed or just `if(...)` somewhere.
178	Having a function like this is often a side-effect of improper data-structures. If you need to cache all calls that have been looked at, add a `SmallPtrSet<CallInst*, 8>` to do so instead of iterating over all vectors that are keys in a map.
191–193	Please use llvm data structures and give variables proper names.
228	This function is quadratic in the number of `__kmpc_static_init4` calls.
250–254	This class provides a way to access/iterate over all call sites of a known OpenMP runtime call. Please use that.
309	The formatting of this function makes it really hard to read it. Please clang-format your patch before uploading it.
309	Please do not iterate over the entire function, basically ever. Start with the things you are interested in and go from there. Iterating over the entire function is at some point costly and always wasteful.
313	No commented out code.

This doesn't seem to use dominance at all. How do you handle

if (a) {
#pragma omp for
for (int i = 0; i < 10; i++)
  ;
} else {
#pragma omp for
for (int i = 0; i < 10; i++)
  ;
}

This revision now requires changes to proceed.Mar 19 2020, 10:29 PM

Hmm. Random question: are these optimizations friendly to the openmp tooling?

In D75384#1933294, @lebedev.ri wrote:

Hmm. Random question: are these optimizations friendly to the openmp tooling?

Not yet but it's on my list. The plan was to start improving this once we have an "intrusive" transformation, e.g., parallel region merging I have downstream. As of now, I expect us to emit remarks that a tool can use to match the code and events seen at runimte with the input. Similarly, we need to deal with source location changes, update source locations when we merge calls, etc. (there is a TODO in OpenMPOpt already). Feedback on this is very welcome. I also wanted the input of @jmellorcrummey on that.

In D75384#1932847, @jdoerfert wrote:
This doesn't seem to use dominance at all. How do you handle
if (a) {
#pragma omp for
for (int i = 0; i < 10; i++)
  ;
} else {
#pragma omp for
for (int i = 0; i < 10; i++)
  ;
}

if the value of a is known at the time of compilation, we will only have one "for loop". Therefore, for this specific case, the implemented technique will see only one "for loop" at the IR level. This case should be a problem.

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
151	done
153	not sure what does this means. I should add ".. at the end of each comment line????
166	Changed variable is confused with the Changed variable within the run() function. Changed it to local "changed=false"
313	done

Herald added a subscriber: yaxunl. · View Herald TranscriptApr 1 2020, 4:07 PM

abidmalikwaterloo added a subscriber: alokmishra.besu.Apr 7 2020, 9:25 AM

In D75384#1955925, @abidmalikwaterloo wrote:
In D75384#1932847, @jdoerfert wrote:
This doesn't seem to use dominance at all. How do you handle
if (a) {
#pragma omp for
for (int i = 0; i < 10; i++)
  ;
} else {
#pragma omp for
for (int i = 0; i < 10; i++)
  ;
}
if the value of a is known at the time of compilation, we will only have one "for loop". Therefore, for this specific case, the implemented technique will see only one "for loop" at the IR level. This case should be a problem.

This can be done if we collapse the loops within basic blocks. Then this will make the two for loops independent of each other.

Any update on this?

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
153	You should a dot, ".", at the end of a full sentence.

abidmalikwaterloo updated this revision to Diff 281030.Jul 27 2020, 1:42 PM

abidmalikwaterloo removed a subscriber: alokmishra.besu.

Herald added a subscriber: sstefan1. · View Herald TranscriptJul 27 2020, 1:42 PM

Can you please diff this against the master branch (any commit). You updated it with a single commit against the old version of this patch. You should probably just merge all you have into a single commit for now.

openmp/runtime/src/kmp_sched.cpp
568	leftover

ggeorgakoudis added a subscriber: ggeorgakoudis.Jul 28 2020, 10:53 AM

Also clang format the patch please.

The patch now handles conditional blocks containing parallel for loops.

abidmalikwaterloo updated this revision to Diff 296335.Oct 5 2020, 5:40 PM

I don't see a difference in the diff.

This revision now requires changes to proceed.Oct 5 2020, 5:42 PM

Yes, I am trying to figure it out why it is not loading my tests and changes

I have been trying to update my patch by following the steps on:

https://llvm.org/docs/Contributing.html#how-to-submit-a-patch

However, I always end up loading nothing.

In D75384#2317444, @abidmalikwaterloo wrote:

I have been trying to update my patch by following the steps on:

https://llvm.org/docs/Contributing.html#how-to-submit-a-patch

However, I always end up loading nothing.

Looks like you have multiple commits and you are only including the last one. Have you considered putting everything in one commit and then you can update this revision?

I tried almost everything. Let me try again.

abidmalikwaterloo mentioned this in D90103: Add OpenMP for optimization.Nov 19 2020, 2:35 PM

See the new patch

https://reviews.llvm.org/D90103

I have submitted the new patch As I could not upload the modifications through this thread.

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
178	See the new patch https://reviews.llvm.org/D90103
191–193	See the new patch https://reviews.llvm.org/D90103
228	See the new patch https://reviews.llvm.org/D90103
250–254	See the new patch https://reviews.llvm.org/D90103 I removed this function in the new implementation.
309	See the new patch https://reviews.llvm.org/D90103 I removed this in the new patch.

abidmalikwaterloo marked 7 inline comments as done.Nov 19 2020, 2:45 PM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

IPO/

OpenMPOpt.cpp

281 lines

openmp/

runtime/

src/

kmp_sched.cpp

1 line

Diff 296335

llvm/lib/Transforms/IPO/OpenMPOpt.cpp

Show First 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	OpenMPOpt(SmallPtrSetImpl<Function *> &SCC,
CallGraphUpdater &CGUpdater)		CallGraphUpdater &CGUpdater)
: M((SCC.begin())->getParent()), SCC(SCC), ModuleSlice(ModuleSlice),		: M((SCC.begin())->getParent()), SCC(SCC), ModuleSlice(ModuleSlice),
OMPBuilder(M), CGUpdater(CGUpdater) {		OMPBuilder(M), CGUpdater(CGUpdater) {
initializeTypes(M);		initializeTypes(M);
initializeRuntimeFunctions();		initializeRuntimeFunctions();
OMPBuilder.initialize();		OMPBuilder.initialize();
}		}

/// Loop fusion		/// Data structure to hold information for the deleting
		/// redundent OpenMP for loop call
struct OMPLoopFusion {		struct OMPLoopFusion {
//		/// Keeps map of __kmpc_static_init4 and its __kmpc_static_fini calls for each OpenMP for loop
std::map<CallInst , CallInst > call_init_fini_mapping;		std::map<CallInst , CallInst > call_init_fini_mapping;
		/// Keeps map of __kmpc_static_init4 and all its compatilable __kmpc_static_init4 in a vector
std::map<CallInst , std::vector<CallInst >> call_map;		std::map<CallInst , std::vector<CallInst >> call_map;
std::map<Value , Value > store_op0_op1;		/// store_op0_op01 keeps map of operand 1 and operand 0
std::map<Value , Value > args_map;		/// args_map keeps map of arguments of __kmpc_static_init4 for later cleaning
		std::map<Value , Value > store_op0_op1, args_map;
CallInst *current_call_init_instruction = nullptr;		CallInst *current_call_init_instruction = nullptr;
};		};

/// Generic information that describes a runtime function		/// Generic information that describes a runtime function
struct RuntimeFunctionInfo {		struct RuntimeFunctionInfo {
/// The kind, as described by the RuntimeFunction enum.		/// The kind, as described by the RuntimeFunction enum.
RuntimeFunction Kind;		RuntimeFunction Kind;

Show All 40 Lines	bool run() {
bool Changed = false;		bool Changed = false;

LLVM_DEBUG(dbgs() << TAG << "Run on SCC with " << SCC.size()		LLVM_DEBUG(dbgs() << TAG << "Run on SCC with " << SCC.size()
<< " functions in a slice with " << ModuleSlice.size()		<< " functions in a slice with " << ModuleSlice.size()
<< " functions\n");		<< " functions\n");

Changed \|= deduplicateRuntimeCalls();		Changed \|= deduplicateRuntimeCalls();
Changed \|= deleteParallelRegions();		Changed \|= deleteParallelRegions();
Changed \|= runTheOMPLoopFusion();		Changed \|= deleteStaticScheduleCalls();
		jdoerfertUnsubmitted Not Done Reply Inline Actions As mentioned in the main message, this does not perform loop fusion. It is a more complex deduplication of runtime calls that may enable loop fusion but for the sake of this patch and this code it is just runtime call deduplication. jdoerfert: As mentioned in the main message, this does not perform loop fusion. It is a more complex…

return Changed;		return Changed;
}		}

private:		private:
/// Try to fuse OpenMP for loop with static scheduling		/// Combine "OpenMP for loop with static scheduling"
/// check if all parameters are same and the loops are adjacent		/// check if all parameters are same and the loops are adjacent
/// the two for loops can be used		/// See https://openmp.llvm.org/Reference.pdf. See section 5.8.3.24 for parameters
		/// The two for loops can share the same __kmpc_static_init4() and __kmpc_static_fini()
bool runTheOMPLoopFusion() {		/// calls. Consider the following example
bool Changed = false;		/// #pragma omp parallel
		/// {
		/// #pragma omp for
		/// for (int i=0; i < 10; i++)
		/// printf(i); // Loop-1
		/// #pragma omp for
		/// for (int j=0; j < 10; j++)
		/// printf(j); // Loop-2
		/// }
		/// The __kmpc_static_fini() of Loop-1 and __kmpc_static_init4() of Loop-2
		/// can be removed and two loops can be run under a single pair of __kmpc_static_init4() and __kmpc_static_fini() calls
		jdoerfertUnsubmitted Done Reply Inline Actions The example above is generally nice but maybe we should not print values in parallel, just initialize two arrays. Style: Clang format all code and comments (=the entire file). Also adjust the indention in the example. jdoerfert: The example above is generally nice but maybe we should not print values in parallel, just…
		abidmalikwaterlooAuthorUnsubmitted Done Reply Inline Actions done abidmalikwaterloo: done

		/// The following function is the main pipeline for the whole process
		jdoerfertUnsubmitted Done Reply Inline Actions Nit: two spaces, if you write sentences add the '.'. jdoerfert: Nit: two spaces, if you write sentences add the '.'.
		abidmalikwaterlooAuthorUnsubmitted Done Reply Inline Actions not sure what does this means. I should add ".. at the end of each comment line???? abidmalikwaterloo: not sure what does this means. I should add ".. at the end of each comment line????
		jdoerfertUnsubmitted Done Reply Inline Actions You should a dot, ".", at the end of a full sentence. jdoerfert: You should a dot, ".", at the end of a full sentence.
		bool deleteStaticScheduleCalls() {
		bool Changed = true;
for (Function *F : SCC) {		for (Function *F : SCC) {
OMPLoopFusion OLF;		OMPLoopFusion OLF;
runOverTheBlock(F, &OLF);		/// Run over the function and prepare the data and data structures
		runOverTheBlock(*F, &OLF);
		/// The following check the compatibility of between two
		/// adjacent OpenMP for loops
checkTheCompatibility(&OLF);		checkTheCompatibility(&OLF);
		/// The following cleans the redundent instructions in the combined loop
Changed = cleanInstructions(&OLF);		Changed = cleanInstructions(&OLF);
replace_UseValues(F, &OLF);		/// The following replaces the use values in the combined loop
// printFunction(F);		if (Changed) replace_UseValues(*F, &OLF);
		jdoerfertUnsubmitted Done Reply Inline Actions Changed should never be set to false once it is true. However, only because you do that this works. You need a local changed or just `if(...)` somewhere. jdoerfert: Changed should never be set to false once it is true. However, only because you do that this…
		abidmalikwaterlooAuthorUnsubmitted Done Reply Inline Actions Changed variable is confused with the Changed variable within the run() function. Changed it to local "changed=false" abidmalikwaterloo: Changed variable is confused with the Changed variable within the run() function.
}		}
return Changed;		return Changed;
}		}

///		/// The following function determines wheither a call instructions has
		jdoerfertUnsubmitted Not Done Reply Inline Actions While I am pretty sure the entire map construction is overly complicated and should be replaced, you can achieve the above with something like: for (const auto &It : m) if (llvm::any_of(It.second, [I](CallInst CI) { return I == CI;})) return true; return false; jdoerfert:* While I am pretty sure the entire map construction is overly complicated and should be replaced…
//		/// already been mapped. If yes then there is no need to test its compatibilility again
bool find(CallInst I, std::map<CallInst , std::vector<CallInst *>> m) {		bool find(CallInst I, std::map<CallInst , std::vector<CallInst *>> m) {
if (m.size() == 0)		for (auto itr :m)
return false;		for (auto itr1 : (itr.second))
for (auto itr = m.begin(); itr != m.end(); ++itr) {		if (I == itr1) return true;
if ((itr->second).size() == 0)
continue;
else {
for (auto itr1 = (itr->second).begin(); itr1 != (itr->second).end();
++itr1) {
if (I == *itr1)
return true;
}
}
}
return false;		return false;
}		}
		jdoerfertUnsubmitted Done Reply Inline Actions Having a function like this is often a side-effect of improper data-structures. If you need to cache all calls that have been looked at, add a `SmallPtrSet<CallInst, 8>` to do so instead of iterating over all vectors that are keys in a map. jdoerfert:* Having a function like this is often a side-effect of improper data-structures. If you need to…
		abidmalikwaterlooAuthorUnsubmitted Done Reply Inline Actions See the new patch https://reviews.llvm.org/D90103 abidmalikwaterloo: See the new patch https://reviews.llvm.org/D90103

/// The following functions check the compatibility of the		/// The following functions check the compatibility of the
// __kmpc_for_static_init_4 call instructions for the fusion		/// __kmpc_for_static_init_4 call instructions for
		/// deletion.

bool checkTheCompatibility(OMPLoopFusion *OLF) {		void checkTheCompatibility(OMPLoopFusion *OLF) {
bool compatible = true;		bool compatible = true;
for (auto itr = OLF->call_init_fini_mapping.begin();		/// for each __kmpc_static_init4 call instruction
itr != OLF->call_init_fini_mapping.end(); ++itr) {		for (auto itr : OLF->call_init_fini_mapping) {
		/// check wheither it has been mapped already
		if (find(itr.first, OLF->call_map)) continue;
std::vector<CallInst *> v;		std::vector<CallInst *> v;
if (find(itr->first, OLF->call_map))		std::vector<Value *> v1;
continue;		/// if not then store the arguments of __kmpc_static_init4 in vector v1
for (auto itr1 = itr; itr1 != OLF->call_init_fini_mapping.end(); ++itr1) {		for (Value *arg : (itr.first)->args())
		jdoerfertUnsubmitted Done Reply Inline Actions Please use llvm data structures and give variables proper names. jdoerfert: Please use llvm data structures and give variables proper names.
		abidmalikwaterlooAuthorUnsubmitted Done Reply Inline Actions See the new patch https://reviews.llvm.org/D90103 abidmalikwaterloo: See the new patch https://reviews.llvm.org/D90103
if (itr == itr1)
continue;
else {
std::vector<Value *> v1, v2;
for (Value *arg : (itr->first)->args()) {
v1.push_back(arg);		v1.push_back(arg);
}		/// check the other __kmpc_static_init4 call instructions
for (Value *arg2 : (itr1->first)->args()) {		for (auto itr1 : OLF->call_init_fini_mapping) {
		if ((itr.first) == (itr1.first)) continue;
		std::vector<Value *> v2;
		/// Store the arguments of __kmpc_static_init4 in vector v2
		for (Value *arg2 : (itr1.first)->args())
v2.push_back(arg2);		v2.push_back(arg2);
}		/// check wheither each arguments from both call __kmpc_static_init4 are same
for (auto i = v1.begin(), j = v2.begin();		for (auto i = v1.begin(), j = v2.begin(); i != v1.end() && j != v2.end(); ++i, ++j) {
i != v1.end() && j != v2.end(); ++i, ++j) {		OLF->args_map.insert({j,i});
if (isa<Constant>(i) && isa<Constant>(j)) {		if (isa<Constant>(i) && isa<Constant>(j)) {
if (i != j) {		if (i != j) {
		// if any constant is not equal, the compatibility test fails
compatible = false;		compatible = false;
break;		break;}
}
} else { // we have a pointer argument		} else { // we have a pointer argument
if (OLF->store_op0_op1.find(*j)->second !=		if (OLF->store_op0_op1.find(j)->second != OLF->store_op0_op1.find(i)->second) {
OLF->store_op0_op1.find(*i)->second) {		// if any pointer value is not equal, the compatibility test fails
compatible = false;		compatible = false;
break;		break;}
}
}		}
}		}
if (compatible) {		if (compatible)
errs() << "Success"		v.push_back(itr1.first);
<< "\n";		else
v.push_back(itr1->first);		break; // the adjacent for omp loop is not compatible so there is no need to check others
} else		// therefore we need to break out of the second for loop
break;
}
}		}
		/// if a call instruction has some compatible call instructions then put in the call_map container
OLF->call_map.insert({itr->first, v});		if (v.size() !=0) OLF->call_map.insert({itr.first, v});
		jdoerfertUnsubmitted Done Reply Inline Actions Why does it always return false? jdoerfert: Why does it always return false?
if (!compatible) {		/// make the flag true again for the next instruction checking
compatible = true;		if (!compatible) compatible = true;
}		}
}		}
		jdoerfertUnsubmitted Done Reply Inline Actions This function is quadratic in the number of `__kmpc_static_init4` calls. jdoerfert: This function is quadratic in the number of `__kmpc_static_init4` calls.
		abidmalikwaterlooAuthorUnsubmitted Done Reply Inline Actions See the new patch https://reviews.llvm.org/D90103 abidmalikwaterloo: See the new patch https://reviews.llvm.org/D90103
return false;
}

/// The function goes through each BB and check the call instructions		/// The function goes through each BB and check the call instructions
		jdoerfertUnsubmitted Not Done Reply Inline Actions Please use modern C++. LLVM is currently based on C++14, for years we allow range based loops: `for (Instruction I : instructions(F)) {` jdoerfert:* Please use modern C++. LLVM is currently based on C++14, for years we allow range based loops…
/// __kmpc_for_static_init_4 and __kmpc_for_static_fini		/// __kmpc_for_static_init_4 and __kmpc_for_static_fini
/// store the operands of the store instructios		/// store the operands of the store instructions
void runOverTheBlock(Function F, OMPLoopFusion OLF) {		void runOverTheBlock(Function &F, OMPLoopFusion *OLF) {
		jdoerfertUnsubmitted Not Done Reply Inline Actions Please look take a look at `deduplicateRuntimeCalls` and `deleteParallelRegions`. Their we utilize a caching system to find calls to OpenMP runtime functions. It is important that we do not cause a constant overhead in this pass by scanning instructions multiple times. jdoerfert: Please look take a look at `deduplicateRuntimeCalls` and `deleteParallelRegions`. Their we…
for (Function::iterator FI = F->begin(); FI != F->end(); ++FI) {		for (auto &BB: F) {
for (BasicBlock::iterator BBI = FI->begin(); BBI != FI->end(); ++BBI) {		for (auto &BBI: BB) {
if (CallInst *c = dyn_cast<CallInst>(BBI)) {		if (CallInst *c = dyn_cast<CallInst>(&BBI)) {
if (c->getCalledFunction()->getName() == "__kmpc_for_static_init_4") {		/// if its a __kmpc_static_init4
		if (c->getCalledFunction()->getName() == "__kmpc_for_static_init_4")
OLF->current_call_init_instruction = c;		OLF->current_call_init_instruction = c;
} else if (c->getCalledFunction()->getName() ==		/// if its a __kmpc_static_fini
"__kmpc_for_static_fini") {		if (c->getCalledFunction()->getName() == "__kmpc_for_static_fini")
OLF->call_init_fini_mapping.insert(		OLF->call_init_fini_mapping.insert({OLF->current_call_init_instruction, c});
		jdoerfertUnsubmitted Not Done Reply Inline Actions Since you are interested in special stores only it would make sense to look for them explicitly, e.g., by following the uses of arguments passed into the `__kmpc_for_static_init_4` call. jdoerfert: Since you are interested in special stores only it would make sense to look for them explicitly…
{OLF->current_call_init_instruction, c});
}
} else if (StoreInst *store = dyn_cast<StoreInst>(BBI)) {
OLF->store_op0_op1.insert(
{store->getOperand(1), store->getOperand(0)});
}		}
		/// if its a store instruction
		if (StoreInst *store = dyn_cast<StoreInst>(&BBI))
		/// store the operands
		OLF->store_op0_op1.insert({store->getOperand(1), store->getOperand(0)});
}		}
		jdoerfertUnsubmitted Not Done Reply Inline Actions Most of these functions lack documentation and all of them lack comments explaining the code. The names need to be adjusted wrt. the coding standard and to make them expressive. For example, "clean" is nothing I can associate with an action. jdoerfert: Most of these functions lack documentation and all of them lack comments explaining the code.
}		}
}		}

		jdoerfertUnsubmitted Done Reply Inline Actions We cannot have stray debug output. jdoerfert: We cannot have stray debug output.

		/// The following function delete the __kmpc_static_fini and __kmpc_static_init4
		/// call instructions which are redundent
		jdoerfertUnsubmitted Done Reply Inline Actions This class provides a way to access/iterate over all call sites of a known OpenMP runtime call. Please use that. jdoerfert: This class provides a way to access/iterate over all call sites of a known OpenMP runtime call.
		abidmalikwaterlooAuthorUnsubmitted Done Reply Inline Actions See the new patch https://reviews.llvm.org/D90103 I removed this function in the new implementation. abidmalikwaterloo: See the new patch https://reviews.llvm.org/D90103 I removed this function in the new…

bool cleanInstructions(OMPLoopFusion *OLF) {		bool cleanInstructions(OMPLoopFusion *OLF) {
		/// By default no change in the IR
bool changed = false;		bool changed = false;
errs() << "[starting cleaning]"		/// Take each instructions that has some compatible instructions
<< "\n";		for (auto itr : OLF->call_map) {
for (auto itr = OLF->call_map.begin(); itr != OLF->call_map.end(); itr++) {		/// count the number of compatible instructions
Instruction *I = OLF->call_init_fini_mapping.find(itr->first)->second;		int count = (itr.second).size();
int count = (itr->second).size();		/// get the __kmpc_static_fini instruction specific to the __kmpc_static_init4
if (count == 0)		/// and delete it
continue;		Instruction *I = OLF->call_init_fini_mapping.find(itr.first)->second;
else
I->eraseFromParent();		I->eraseFromParent();
		/// We have changed the IR
changed = true;		changed = true;
for (auto itr1 = (itr->second).begin(); itr1 != (itr->second).end();		/// loop over the compatible instructions
itr1++) {		for (auto itr1:itr.second) {
Instruction I1 = itr1;		Instruction *I1 = itr1;
I1->eraseFromParent();		I1->eraseFromParent();
for (auto itr2 = OLF->call_init_fini_mapping.begin();		for (auto itr2: OLF->call_init_fini_mapping) {
itr2 != OLF->call_init_fini_mapping.end(); itr2++) {		/// Delete the respective __kmpc_static_fini call instruction
if (*itr1 == itr2->first) {		if (itr1 == itr2.first) {
Instruction *I2 = itr2->second;		Instruction *I2 = itr2.second;
if (count == 1 \|\| count == 0)		/// Do not erase the last __kmpc_static_fini call instruction
break;		if (count == 1 \|\| count == 0) break;
else {
I2->eraseFromParent();		I2->eraseFromParent();
count--;		count--;
}		}
}		}
}		}
}		}
}
errs() << "[Ending Cleaning]"
<< "\n";
return changed;		return changed;
}		}

void replace_UseValues(Function F, OMPLoopFusion OLF) {		/// The following function replaces the use values
		/// of the deleted __kmpc_static_init4 call instructions with the parent
		/// call function. The __kmpc_static_init4 is a parent call function
		/// if it comes first

		void replace_UseValues(Function &F, OMPLoopFusion *OLF){
std::vector<Instruction *> remove;		std::vector<Instruction *> remove;
for (auto itr = OLF->call_map.begin(); itr != OLF->call_map.end(); itr++) {		for (auto &BB : F)
std::vector<Value *> vm;
for (Value *arg : (itr->first)->args()) {
vm.push_back(arg);
}
for (auto itr1 = (itr->second).begin(); itr1 != (itr->second).end();
itr1++) {
std::vector<Value *> vs;
for (Value arg : (itr1)->args()) {
vs.push_back(arg);
}
for (auto vmitr = vm.begin(), vsitr = vs.begin();
vmitr != vm.end() && vsitr != vs.end(); vmitr++, vsitr++) {
if (isa<Constant>(*vmitr))
continue;
for (auto &BB : *F) {
for (auto &II : BB) {		for (auto &II : BB) {
Instruction *It = &II;		Instruction *It = &II;
if (isa<CallInst>(It))		if (isa<CallInst>(It)) continue;
continue;
for (unsigned int k = 0; k < It->getNumOperands(); k++) {		for (unsigned int k = 0; k < It->getNumOperands(); k++){
if (It->getOperand(k) == *vsitr) {		auto temp = OLF->args_map.find(It->getOperand(k));
It->setOperand(k, *vmitr);		if (temp != OLF->args_map.end()){
if (isa<StoreInst>(It) && k > 0) {		It->setOperand(k, temp->second);
remove.push_back(It);		if (isa<StoreInst>(It) && k > 0) remove.push_back(It);
}
}
}		}
}		}
}		}
		for (auto r: remove)
		r->eraseFromParent();
}		}
		jdoerfertUnsubmitted Done Reply Inline Actions The formatting of this function makes it really hard to read it. Please clang-format your patch before uploading it. jdoerfert: The formatting of this function makes it really hard to read it. Please clang-format your patch…
		jdoerfertUnsubmitted Done Reply Inline Actions Please do not iterate over the entire function, basically ever. Start with the things you are interested in and go from there. Iterating over the entire function is at some point costly and always wasteful. jdoerfert: Please do not iterate over the entire function, basically ever. Start with the things you are…
		abidmalikwaterlooAuthorUnsubmitted Done Reply Inline Actions See the new patch https://reviews.llvm.org/D90103 I removed this in the new patch. abidmalikwaterloo: See the new patch https://reviews.llvm.org/D90103 I removed this in the new patch.
}		/// does function printing
}		// void printFunction(Function &F) {
for (auto r = remove.begin(); r != remove.end(); r++)		// F.print(errs(), nullptr);
(*r)->eraseFromParent();		// }
		jdoerfertUnsubmitted Done Reply Inline Actions No commented out code. jdoerfert: No commented out code.
		abidmalikwaterlooAuthorUnsubmitted Done Reply Inline Actions done abidmalikwaterloo: done
}

void printFunction(Function *F) {
errs() << "\nThis is the new IR for the fused stuff\n\n";
F->print(errs(), nullptr);
}

/// Try to delete parallel regions if possible		/// Try to delete parallel regions if possible
bool deleteParallelRegions() {		bool deleteParallelRegions() {
const unsigned CallbackCalleeOperand = 2;		const unsigned CallbackCalleeOperand = 2;

RuntimeFunctionInfo &RFI = RFIs[OMPRTL___kmpc_fork_call];		RuntimeFunctionInfo &RFI = RFIs[OMPRTL___kmpc_fork_call];
if (!RFI.Declaration)		if (!RFI.Declaration)
return false;		return false;
▲ Show 20 Lines • Show All 358 Lines • Show Last 20 Lines

openmp/runtime/src/kmp_sched.cpp

Show First 20 Lines • Show All 559 Lines • ▼ Show 20 Lines	if (incr == 1) {
trip_count = plower - pupperDist + 1;		trip_count = plower - pupperDist + 1;
} else if (incr > 1) {		} else if (incr > 1) {
// upper-lower can exceed the limit of signed type		// upper-lower can exceed the limit of signed type
trip_count = (UT)(pupperDist - plower) / incr + 1;		trip_count = (UT)(pupperDist - plower) / incr + 1;
} else {		} else {
trip_count = (UT)(plower - pupperDist) / (-incr) + 1;		trip_count = (UT)(plower - pupperDist) / (-incr) + 1;
}		}
KMP_DEBUG_ASSERT(trip_count);		KMP_DEBUG_ASSERT(trip_count);
		errs() << trip_count <<"[][]\n";
		jdoerfertUnsubmitted Done Reply Inline Actions leftover jdoerfert: leftover
switch (schedule) {		switch (schedule) {
case kmp_sch_static: {		case kmp_sch_static: {
if (trip_count <= nth) {		if (trip_count <= nth) {
KMP_DEBUG_ASSERT(		KMP_DEBUG_ASSERT(
__kmp_static == kmp_sch_static_greedy \|\|		__kmp_static == kmp_sch_static_greedy \|\|
__kmp_static ==		__kmp_static ==
kmp_sch_static_balanced); // Unknown static scheduling type.		kmp_sch_static_balanced); // Unknown static scheduling type.
if (tid < trip_count)		if (tid < trip_count)
▲ Show 20 Lines • Show All 429 Lines • Show Last 20 Lines