This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/IPO/
-
Transforms/
-
IPO/
6/18
OpenMPOpt.cpp
-
test/Transforms/OpenMP/
-
Transforms/
-
OpenMP/
1
constant_thread_count_analysis.ll
2/4
shared_firstprivate_analysis.ll

Differential D81036

[OpenMP] Begin Adding Analysis Remarks for OpenMP Best Practises.
Needs ReviewPublic

Authored by jhuber6 on Jun 2 2020, 2:43 PM.

Download Raw Diff

Details

Reviewers

jdoerfert

Summary

Adding analysis remarks to guide users to OpenMP best practices. The goal will to have a suite of analysis passes that can detect common mistakes in OpenMP code. This currently only checks if the user has specified a parallel region with a constant thread count.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	1,130 ms	libomp.lock::Unknown Unit Message ("")

Event Timeline

jhuber6 created this revision.Jun 2 2020, 2:43 PM

Herald added subscribers: llvm-commits, sstefan1, ormris and 3 others. · View Herald TranscriptJun 2 2020, 2:43 PM

Cool! Some minor comments below.

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
423	Early exits are usually easier to read, thus: if (!UV) return; Maybe we should be more descriptive, telling people that fixed numbers are not future compatible. And we should not issue the remark if we find a worksharing loop with a constant trip count (that is a multiple of this constant). I guess the latter part can just be a TODO for now, maybe also a test case: #pragma omp parallel for num_threads(8) for (int i = 0; i < 8; ++i) body(i); That reminds me that we should determine the loop trip count for worksharing loops as well. We could even use that value to feed it into "set_num_threads" to avoid waking more threads than needed.
llvm/test/Transforms/OpenMP/constant_thread_count_analysis.ll
120	Don't think we need this comment ;)

Harbormaster failed remote builds in B58820: Diff 267996!Jun 2 2020, 3:23 PM

jhuber6 marked an inline comment as done.Jun 2 2020, 6:22 PM

jhuber6 added inline comments.

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
423	Might be helpful, something like "Use of OpenMP parallel region with constant number of threads is not future compatibile. Consider a dynamic thread count instead." And are there any general ways to extract the loop tripcount in LLVM? Depending on where you are in the optimization the loop could take many different forms as far as I'm aware.

Making minor adjustments.

Any recommendations on which best practices to target next? I'm not intimately familiar with OpenMP best practices so I'm not sure which things would qualify.

Bikeshedding: do we really want to do this here?
If we are not worried about the cases where the thread count was a variable originally,
but the variable eventually got folded into a constant,
this can be trivially done as a clang-tidy check, or clang diag.

Harbormaster failed remote builds in B58914: Diff 268173!Jun 3 2020, 7:38 AM

In D81036#2071208, @jhuber6 wrote:

Making minor adjustments.

Any recommendations on which best practices to target next? I'm not intimately familiar with OpenMP best practices so I'm not sure which things would qualify.

We can do one that "in theory" we will optimize soon:
Look for pointer arguments of the outlined parallel body function that are marked readonly (except the first two). For those, tell people that these should be passed by value via firstprivate.

You can also look for a __kmpc_barrier just before the end of a parallel region, e.g., in

#pragma omp parallel
{
   some code
   #pragma omp for
   for (...)
      ...
}

I also think we should improve the parallel annotation in a follow up step, that is, look at the body and avoid remarks if the usage of a constant thread count seems reasonable. There is also the case where the constant thread count might be reasonable but we can ask for further information, e.g.,

#pragma omp parallel for num_threads(4)
for (int i = 0; i < N; ++i)
  ...

We could tell the user that a constant thread count with an unknown loop trip count might not be what they want. Either make the thread count variable (or remove it), or provide information about the loop bound via operations like __builtin_assume(N % 4 == 0), or __builtin_assume(N = 4 * k).

In D81036#2071289, @lebedev.ri wrote:

Bikeshedding: do we really want to do this here?
If we are not worried about the cases where the thread count was a variable originally,
but the variable eventually got folded into a constant,
this can be trivially done as a clang-tidy check, or clang diag.

There are obviously pros and cons to the location to do this. Let me start by saying the thread count thing is just a simple starter, the idea is to allow opt-in analysis that try to provide way more insights into the program. Later I also want to modify the code and provide feedback at runtime or based on profiling data.
That said, there are other reasons to put it here, e.g., Fortran ;)
The variable vs constant thing could be checked in the frontend but it is questionable if the following cases are really different, I mean if we don't want constant prop to happen first or not,

void foo1() {
  #pragma omp parallel num_threads(8)
  {}
}
void foo2() {
  #pragma omp parallel num_threads(NumThreads)
  {]
}
void foo3(unsigned NT = 8) {
  #pragma omp parallel num_threads(NT)
  {]
}

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
423	"Consider using a scalable thread count." + ", or remove the `num_threads` clause.
423	And are there any general ways to extract the loop tripcount in LLVM? Depending on where you are in the optimization the loop could take many different forms as far as I'm aware. There is, and yes loops have different forms. The usual way is to ask ScalarEvolution for the trip count, however, that won't work for worksharing loops. Though, worksharing loops are actually easy to predict. Look for the `__kmpc_for_init` (or similar) call and follow the lower and upper bound pointers. They should be written exactly once, and the difference between those two values is the trip count, at least for clang as we do normalize the loops. More general we also need to follow the stride argument and do some slightly more complex computation but it is still doable. You can even use ScalarEvolution to represent the bounds, stride, and difference. That way allows you to also create an expression representing the value into the code (if that is needed later), and it will simplify the expression if possible, e.g., constant fold it.

jhuber6 marked an inline comment as done.Jun 4 2020, 9:58 AM

jhuber6 added inline comments.

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
423	The loop body where `__kmpc_for_init` is called is inside the callback function, which might also be separated by another funciton. I'm guessing the way to handle this would to start with the function where there's a `__kmpc_init` call and go backwards in the gall graph until you find the function that contains the `__kmpc_fork` call and check if there's a corresponding `__kmpc_push_num_threads` call. If there is and the found trip count isn't a multiple of the constant thread count, emit a message. Is that about right? I haven't seen any examples that show traversing the call graph in this way, just iterating over all the functions.

Small changes and adding support for checking use of shared for variables that should be pass-by-value.

jhuber6 marked 2 inline comments as done.Jun 8 2020, 3:45 PM

jhuber6 added inline comments.

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
433	I originally had this function take a function, find the fork_call or fork_teams, and return the callback argument if it existed. I had to take the logic out because otherwise I didn't have an instruction to get a debugloc for. This doesn't handle debug options where the compiler will wrap the parallel region in another function.
llvm/test/Transforms/OpenMP/shared_firstprivate_analysis.ll
126	Should I combine this into one giant file that just handles OpenMP Analysis stuff?

Harbormaster failed remote builds in B59556: Diff 269370!Jun 8 2020, 4:39 PM

Thanks for the update! With the second analysis we can show why the frontend is not necessarily the perfect place for this, determining readonly happens naturally in the middle-end. (Not to mention with the analysis for sufficient conditions to know a transformation would be sound.).

I left a bunch of comments, nothing major though.

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
163	I doubt these need to return a value. Can we check if remarks are enabled and only run them in case they are?
350	Why an X-value (`&&`) and not just a reference (`&`)?
466	Nit: -`;` Hoist `RFI` out of the loop. Don't we have a helper to do something "foreach call"... we should. I would not call it `Callback` but `ParallelRegionFn` or similar. Also use proper types not auto when it is not clear and helpful, e.g., for `Callback`. I'd add the word "probably" or "very likely" in the message as it is only known to be the case if the argument also is not modified via a different pointer. (For example, only if it is `noalias` as well we know it is not otherwise modified and can do the transformation ourselves.)
llvm/test/Transforms/OpenMP/shared_firstprivate_analysis.ll
14	We need to reference the value somehow. Where does this point to now, I mean what is in line 7? Could we include the C source in a comment at the top? We probably want to issue 2 remarks, one for the value, with the value debug info, and one for the directive.
126	If there is no inherent benefit I would not merge them. Having more selective tests is usually better. You can prefix their names to make it easier to categorize them (instead of the current postfix), or put them into a subfolder.

jhuber6 marked 3 inline comments as done.Jun 8 2020, 7:49 PM

jhuber6 added inline comments.

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
163	I think there's an option for checking in here but I haven't tried yet, https://llvm.org/doxygen/classllvm_1_1DiagnosticInfoOptimizationBase.html
350	As far as I know, the Remark is being generated here as an r-value so the callback function can either copy it or move it, but can't take a reference to it since it has no location in memory. An r-value could bind to a const reference but the emitter needs to be mutated to build the remark. ORE.emit([&]() { return RemarkCB(RemarkKind(DEBUG_TYPE, RemarkName, Inst)); }); I could change this to generate a RemarkKind inside the function and allow a generic reference, but it would pretty much just change whether or not the object is constructed here or when emit calls the lambda function. That's at least my understanding, I'm not an r-value l-value guru.
466	There's the "foreachUse" function but I wasn't sure if it applied here.

jhuber6 marked an inline comment as done.Jun 8 2020, 8:08 PM

jhuber6 added inline comments.

llvm/test/Transforms/OpenMP/shared_firstprivate_analysis.ll
14	I can go ahead and add the source. To print out the value it refers to I'd probably need to the get the value from the `__kmpc_fork_call` since the remark is generated from looking at the arguments to the `.omp_outlined` function which doesn't know anything about the actual value. Is it safe to assume that the arguments are in the same order here? Like the third argument to `.omp_outlined` is always the first vararg passed to the callback?

Small changes, added check for if remarks were enabled. Added calculation of loop trip count given an __kmpc_for_static_init call. Waiting on ICV tracking support to associate correct parallel regions before moving forward.

Harbormaster failed remote builds in B59859: Diff 269935!Jun 10 2020, 1:21 PM

jdoerfert added inline comments.Jun 10 2020, 6:21 PM

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
147	Nit: Documentation, also all the functions below please.
350	It's fine, I didn't check this deep, just from the diff I figured I ask.
405	Style: I'd get rid of the braces, makes it (for me) harder to argue if the return below is inside the lambda or not.
415	I think the convention is `llvm::None`, or something similar. Above, just `return upper...` so we can also remove the braces. While at it, remove the else (no else after return).
420	Somewhere in here we did: const unsigned CallbackCalleeOperand = 2; To give the magic number 2 a description. Maybe move it into class scope or duplicate it.
444	I think we should copy the pointer `C` by value here, `[=]` or `[C]`.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

IPO/

OpenMPOpt.cpp

108 lines

test/

Transforms/

OpenMP/

constant_thread_count_analysis.ll

131 lines

shared_firstprivate_analysis.ll

143 lines

Diff 269935

llvm/lib/Transforms/IPO/OpenMPOpt.cpp

Show All 15 Lines

#include "llvm/ADT/EnumeratedArray.h"		#include "llvm/ADT/EnumeratedArray.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/CallGraph.h"		#include "llvm/Analysis/CallGraph.h"
#include "llvm/Analysis/CallGraphSCCPass.h"		#include "llvm/Analysis/CallGraphSCCPass.h"
#include "llvm/Analysis/OptimizationRemarkEmitter.h"		#include "llvm/Analysis/OptimizationRemarkEmitter.h"
#include "llvm/Frontend/OpenMP/OMPConstants.h"		#include "llvm/Frontend/OpenMP/OMPConstants.h"
#include "llvm/Frontend/OpenMP/OMPIRBuilder.h"		#include "llvm/Frontend/OpenMP/OMPIRBuilder.h"
		#include "llvm/IR/AbstractCallSite.h"
#include "llvm/InitializePasses.h"		#include "llvm/InitializePasses.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Transforms/IPO.h"		#include "llvm/Transforms/IPO.h"
#include "llvm/Transforms/Utils/CallGraphUpdater.h"		#include "llvm/Transforms/Utils/CallGraphUpdater.h"

using namespace llvm;		using namespace llvm;
using namespace omp;		using namespace omp;
using namespace types;		using namespace types;
▲ Show 20 Lines • Show All 106 Lines • ▼ Show 20 Lines	struct RuntimeFunctionInfo {
}		}

private:		private:
/// Map from functions to all uses of this runtime function contained in		/// Map from functions to all uses of this runtime function contained in
/// them.		/// them.
DenseMap<Function *, std::unique_ptr<UseVector>> UsesMap;		DenseMap<Function *, std::unique_ptr<UseVector>> UsesMap;
};		};

		bool remarksEnabled() {
		jdoerfertUnsubmitted Not Done Reply Inline Actions Nit: Documentation, also all the functions below please. jdoerfert: Nit: Documentation, also all the functions below please.
		auto &Ctx = M.getContext();
		return Ctx.getDiagHandlerPtr()->isAnyRemarkEnabled(DEBUG_TYPE);
		}

/// Run all OpenMP optimizations on the underlying SCC/ModuleSlice.		/// Run all OpenMP optimizations on the underlying SCC/ModuleSlice.
bool run() {		bool run() {
bool Changed = false;		bool Changed = false;

LLVM_DEBUG(dbgs() << TAG << "Run on SCC with " << SCC.size()		LLVM_DEBUG(dbgs() << TAG << "Run on SCC with " << SCC.size()
<< " functions in a slice with " << ModuleSlice.size()		<< " functions in a slice with " << ModuleSlice.size()
<< " functions\n");		<< " functions\n");

Changed \|= deduplicateRuntimeCalls();		Changed \|= deduplicateRuntimeCalls();
Changed \|= deleteParallelRegions();		Changed \|= deleteParallelRegions();

		if (remarksEnabled()) {
		jdoerfertUnsubmitted Not Done Reply Inline Actions I doubt these need to return a value. Can we check if remarks are enabled and only run them in case they are? jdoerfert: I doubt these need to return a value. Can we check if remarks are enabled and only run them in…
		jhuber6AuthorUnsubmitted Done Reply Inline Actions I think there's an option for checking in here but I haven't tried yet, https://llvm.org/doxygen/classllvm_1_1DiagnosticInfoOptimizationBase.html jhuber6: I think there's an option for checking in here but I haven't tried yet, [[ https://llvm.
		analysisConstantThreadNum();
		analysisConstantShared();
		}

return Changed;		return Changed;
}		}

private:		private:
/// Try to delete parallel regions if possible.		/// Try to delete parallel regions if possible.
bool deleteParallelRegions() {		bool deleteParallelRegions() {
const unsigned CallbackCalleeOperand = 2;		const unsigned CallbackCalleeOperand = 2;

▲ Show 20 Lines • Show All 161 Lines • ▼ Show 20 Lines	bool deduplicateRuntimeCalls(Function &F, RuntimeFunctionInfo &RFI,
};		};

if (!ReplVal) {		if (!ReplVal) {
for (Use U : UV)		for (Use U : UV)
if (CallInst CI = getCallIfRegularCall(U, &RFI)) {		if (CallInst CI = getCallIfRegularCall(U, &RFI)) {
if (!CanBeMoved(*CI))		if (!CanBeMoved(*CI))
continue;		continue;

auto Remark = [&](OptimizationRemark OR) {		auto Remark = [&](OptimizationRemark &&OR) {
auto newLoc = &*F.getEntryBlock().getFirstInsertionPt();		auto newLoc = &*F.getEntryBlock().getFirstInsertionPt();
return OR << "OpenMP runtime call "		return OR << "OpenMP runtime call "
<< ore::NV("OpenMPOptRuntime", RFI.Name) << " moved to "		<< ore::NV("OpenMPOptRuntime", RFI.Name) << " moved to "
<< ore::NV("OpenMPRuntimeMoves", newLoc->getDebugLoc());		<< ore::NV("OpenMPRuntimeMoves", newLoc->getDebugLoc());
};		};
		jdoerfertUnsubmitted Not Done Reply Inline Actions Why an X-value (`&&`) and not just a reference (`&`)? jdoerfert: Why an X-value (`&&`) and not just a reference (`&`)?
		jhuber6AuthorUnsubmitted Done Reply Inline Actions As far as I know, the Remark is being generated here as an r-value so the callback function can either copy it or move it, but can't take a reference to it since it has no location in memory. An r-value could bind to a const reference but the emitter needs to be mutated to build the remark. ORE.emit([&]() { return RemarkCB(RemarkKind(DEBUG_TYPE, RemarkName, Inst)); }); I could change this to generate a RemarkKind inside the function and allow a generic reference, but it would pretty much just change whether or not the object is constructed here or when emit calls the lambda function. That's at least my understanding, I'm not an r-value l-value guru. jhuber6: As far as I know, the Remark is being generated here as an r-value so the callback function can…
		jdoerfertUnsubmitted Not Done Reply Inline Actions It's fine, I didn't check this deep, just from the diff I figured I ask. jdoerfert: It's fine, I didn't check this deep, just from the diff I figured I ask.
emitRemark<OptimizationRemark>(CI, "OpenMPRuntimeCodeMotion", Remark);		emitRemark<OptimizationRemark>(CI, "OpenMPRuntimeCodeMotion", Remark);

CI->moveBefore(&*F.getEntryBlock().getFirstInsertionPt());		CI->moveBefore(&*F.getEntryBlock().getFirstInsertionPt());
ReplVal = CI;		ReplVal = CI;
break;		break;
}		}
if (!ReplVal)		if (!ReplVal)
return false;		return false;
Show All 13 Lines	bool deduplicateRuntimeCalls(Function &F, RuntimeFunctionInfo &RFI,

bool Changed = false;		bool Changed = false;
auto ReplaceAndDeleteCB = [&](Use &U, Function &Caller) {		auto ReplaceAndDeleteCB = [&](Use &U, Function &Caller) {
CallInst *CI = getCallIfRegularCall(U, &RFI);		CallInst *CI = getCallIfRegularCall(U, &RFI);
if (!CI \|\| CI == ReplVal \|\| &F != &Caller)		if (!CI \|\| CI == ReplVal \|\| &F != &Caller)
return false;		return false;
assert(CI->getCaller() == &F && "Unexpected call!");		assert(CI->getCaller() == &F && "Unexpected call!");

auto Remark = [&](OptimizationRemark OR) {		auto Remark = [&](OptimizationRemark &&OR) {
return OR << "OpenMP runtime call "		return OR << "OpenMP runtime call "
<< ore::NV("OpenMPOptRuntime", RFI.Name) << " deduplicated";		<< ore::NV("OpenMPOptRuntime", RFI.Name) << " deduplicated";
};		};
emitRemark<OptimizationRemark>(CI, "OpenMPRuntimeDeduplicated", Remark);		emitRemark<OptimizationRemark>(CI, "OpenMPRuntimeDeduplicated", Remark);

CGUpdater.removeCallSite(*CI);		CGUpdater.removeCallSite(*CI);
CI->replaceAllUsesWith(ReplVal);		CI->replaceAllUsesWith(ReplVal);
CI->eraseFromParent();		CI->eraseFromParent();
++NumOpenMPRuntimeCallsDeduplicated;		++NumOpenMPRuntimeCallsDeduplicated;
Changed = true;		Changed = true;
return true;		return true;
};		};
RFI.foreachUse(ReplaceAndDeleteCB);		RFI.foreachUse(ReplaceAndDeleteCB);

return Changed;		return Changed;
}		}

		Optional<unsigned> getOpenMPStaticTripCount(CallInst *CI) {
		auto getStoredConst = [&](Value Arg) -> ConstantInt {
		for (User *Defs : Arg->users()) {
		if (StoreInst *Inst = dyn_cast<StoreInst>(Defs)) {
		if (ConstantInt *C = dyn_cast<ConstantInt>(Inst->getValueOperand()))
		return C;
		}
		}
		jdoerfertUnsubmitted Not Done Reply Inline Actions Style: I'd get rid of the braces, makes it (for me) harder to argue if the return below is inside the lambda or not. jdoerfert: Style: I'd get rid of the braces, makes it (for me) harder to argue if the return below is…
		return nullptr;
		};

		auto lower = getStoredConst(CI->getArgOperand(4));
		auto upper = getStoredConst(CI->getArgOperand(5));
		if (upper && lower) {
		unsigned trip = upper->getSExtValue() - lower->getSExtValue() + 1;
		return {trip};
		} else {
		return {};
		jdoerfertUnsubmitted Not Done Reply Inline Actions I think the convention is `llvm::None`, or something similar. Above, just `return upper...` so we can also remove the braces. While at it, remove the else (no else after return). jdoerfert: I think the convention is `llvm::None`, or something similar. Above, just `return upper...` so…
		}
		}

		Function getParallelRegionEntry(CallInst CI) {
		auto *ParallelRegionFn = &CI->getOperandUse(2);
		jdoerfertUnsubmitted Not Done Reply Inline Actions Somewhere in here we did: const unsigned CallbackCalleeOperand = 2; To give the magic number 2 a description. Maybe move it into class scope or duplicate it. jdoerfert: Somewhere in here we did: ``` const unsigned CallbackCalleeOperand = 2; ``` To give the…
		AbstractCallSite ACS(ParallelRegionFn);
		return ACS.getCalledFunction();
		}
		jdoerfertUnsubmitted Not Done Reply Inline Actions Early exits are usually easier to read, thus: if (!UV) return; Maybe we should be more descriptive, telling people that fixed numbers are not future compatible. And we should not issue the remark if we find a worksharing loop with a constant trip count (that is a multiple of this constant). I guess the latter part can just be a TODO for now, maybe also a test case: #pragma omp parallel for num_threads(8) for (int i = 0; i < 8; ++i) body(i); That reminds me that we should determine the loop trip count for worksharing loops as well. We could even use that value to feed it into "set_num_threads" to avoid waking more threads than needed. jdoerfert: Early exits are usually easier to read, thus: ``` if (!UV) return; ``` --- Maybe we should…
		jhuber6AuthorUnsubmitted Done Reply Inline Actions Might be helpful, something like "Use of OpenMP parallel region with constant number of threads is not future compatibile. Consider a dynamic thread count instead." And are there any general ways to extract the loop tripcount in LLVM? Depending on where you are in the optimization the loop could take many different forms as far as I'm aware. jhuber6: Might be helpful, something like "Use of OpenMP parallel region with constant number of threads…
		jdoerfertUnsubmitted Not Done Reply Inline Actions "Consider using a scalable thread count." + ", or remove the `num_threads` clause. jdoerfert: "Consider using a scalable thread count." + ", or remove the `num_threads` clause.
		jdoerfertUnsubmitted Not Done Reply Inline Actions And are there any general ways to extract the loop tripcount in LLVM? Depending on where you are in the optimization the loop could take many different forms as far as I'm aware. There is, and yes loops have different forms. The usual way is to ask ScalarEvolution for the trip count, however, that won't work for worksharing loops. Though, worksharing loops are actually easy to predict. Look for the `__kmpc_for_init` (or similar) call and follow the lower and upper bound pointers. They should be written exactly once, and the difference between those two values is the trip count, at least for clang as we do normalize the loops. More general we also need to follow the stride argument and do some slightly more complex computation but it is still doable. You can even use ScalarEvolution to represent the bounds, stride, and difference. That way allows you to also create an expression representing the value into the code (if that is needed later), and it will simplify the expression if possible, e.g., constant fold it. jdoerfert: > And are there any general ways to extract the loop tripcount in LLVM? Depending on where you…
		jhuber6AuthorUnsubmitted Done Reply Inline Actions The loop body where `__kmpc_for_init` is called is inside the callback function, which might also be separated by another funciton. I'm guessing the way to handle this would to start with the function where there's a `__kmpc_init` call and go backwards in the gall graph until you find the function that contains the `__kmpc_fork` call and check if there's a corresponding `__kmpc_push_num_threads` call. If there is and the found trip count isn't a multiple of the constant thread count, emit a message. Is that about right? I haven't seen any examples that show traversing the call graph in this way, just iterating over all the functions. jhuber6: The loop body where `__kmpc_for_init` is called is inside the callback function, which might…

		/// Report uses of OpenMP num_threads(n) clause with a constant thread count.
		void analysisConstantThreadNum() {
		auto &RFI = RFIs[OMPRTL___kmpc_push_num_threads];
		for (Function *F : SCC) {
		auto UV = RFI.getUseVector(F);
		if (!UV)
		return;
		for (Use U : UV) {
		if (CallInst CI = getCallIfRegularCall(U, &RFI)) {
		jhuber6AuthorUnsubmitted Done Reply Inline Actions I originally had this function take a function, find the fork_call or fork_teams, and return the callback argument if it existed. I had to take the logic out because otherwise I didn't have an instruction to get a debugloc for. This doesn't handle debug options where the compiler will wrap the parallel region in another function. jhuber6: I originally had this function take a function, find the fork_call or fork_teams, and return…
		auto *Arg = CI->getArgOperand(RFI.getNumArgs() - 1);
		if (ConstantInt *C = dyn_cast<ConstantInt>(Arg)) {

		// TODO: Check if OpenMP Parallel region applies to a loop with a
		// static trip count that is a multiple of the thread count
		auto Remark = [&](OptimizationRemarkAnalysis &&ORA) {
		return ORA << "Use of OpenMP parallel region with a constant ("
		<< ore::NV("ConstantThreadNumber", C)
		<< ") number of threads is not future compatible. "
		<< "Consider using a scalable thread count, "
		<< "or remove the 'num_threads' clause.";
		jdoerfertUnsubmitted Not Done Reply Inline Actions I think we should copy the pointer `C` by value here, `[=]` or `[C]`. jdoerfert: I think we should copy the pointer `C` by value here, `[=]` or `[C]`.
		};
		emitRemark<OptimizationRemarkAnalysis>(
		CI, "OpenMPAnalysisConstantThreadCount", Remark);
		}
		}
		}
		}

		return;
		}

		// Report uses of OpenMP shared clause that should be firstprivate
		void analysisConstantShared() {
		auto &RFI = RFIs[OMPRTL___kmpc_fork_call];
		for (Function *F : SCC) {
		auto UV = RFI.getUseVector(F);
		if (!UV)
		return;

		for (Use U : UV) {
		if (CallInst CI = getCallIfRegularCall(U, &RFI)) {
		auto *ParallelRegionFn = getParallelRegionEntry(CI);
		jdoerfertUnsubmitted Not Done Reply Inline Actions Nit: -`;` Hoist `RFI` out of the loop. Don't we have a helper to do something "foreach call"... we should. I would not call it `Callback` but `ParallelRegionFn` or similar. Also use proper types not auto when it is not clear and helpful, e.g., for `Callback`. I'd add the word "probably" or "very likely" in the message as it is only known to be the case if the argument also is not modified via a different pointer. (For example, only if it is `noalias` as well we know it is not otherwise modified and can do the transformation ourselves.) jdoerfert: Nit: -`;` Hoist `RFI` out of the loop. Don't we have a helper to do something "foreach call"...
		jhuber6AuthorUnsubmitted Done Reply Inline Actions There's the "foreachUse" function but I wasn't sure if it applied here. jhuber6: There's the "foreachUse" function but I wasn't sure if it applied here.
		for (unsigned i = 2; i < ParallelRegionFn->arg_size(); ++i) {
		auto *Arg = ParallelRegionFn->getArg(i);
		if (dyn_cast<PointerType>(Arg->getType()) &&
		ParallelRegionFn->hasParamAttribute(i,
		Attribute::AttrKind::ReadOnly)) {

		// TODO: Get DebugInfo for shared argument
		auto Remark = [&](OptimizationRemarkAnalysis &&ORA) {
		return ORA << "Use of OpenMP shared clause with a value "
		<< "that should likely be pass-by-value. "
		<< "Consider using the firstprivate clause instead";
		};
		emitRemark<OptimizationRemarkAnalysis>(
		CI, "OpenMPAnalysisConstantShared", Remark);
		}
		}
		}
		}
		}

		return;
		}

/// Collect arguments that represent the global thread id in \p GTIdArgs.		/// Collect arguments that represent the global thread id in \p GTIdArgs.
void collectGlobalThreadIdArguments(SmallSetVector<Value *, 16> &GTIdArgs) {		void collectGlobalThreadIdArguments(SmallSetVector<Value *, 16> &GTIdArgs) {
// TODO: Below we basically perform a fixpoint iteration with a pessimistic		// TODO: Below we basically perform a fixpoint iteration with a pessimistic
// initialization. We could define an AbstractAttribute instead and		// initialization. We could define an AbstractAttribute instead and
// run the Attributor here once it can be run as an SCC pass.		// run the Attributor here once it can be run as an SCC pass.

// Helper to check the argument \p ArgNo at all call sites of \p F for		// Helper to check the argument \p ArgNo at all call sites of \p F for
// a GTId.		// a GTId.
▲ Show 20 Lines • Show All 152 Lines • ▼ Show 20 Lines	#include "llvm/Frontend/OpenMP/OMPKinds.def"
/// optimization attempt		/// optimization attempt
///		///
/// The remark is built using a callback function provided by the caller that		/// The remark is built using a callback function provided by the caller that
/// takes a RemarkKind as input and returns a RemarkKind.		/// takes a RemarkKind as input and returns a RemarkKind.
template <typename RemarkKind,		template <typename RemarkKind,
typename RemarkCallBack = function_ref<RemarkKind(RemarkKind &&)>>		typename RemarkCallBack = function_ref<RemarkKind(RemarkKind &&)>>
void emitRemark(Instruction *Inst, StringRef RemarkName,		void emitRemark(Instruction *Inst, StringRef RemarkName,
RemarkCallBack &&RemarkCB) {		RemarkCallBack &&RemarkCB) {

Function *F = Inst->getParent()->getParent();		Function *F = Inst->getParent()->getParent();
auto &ORE = OREGetter(F);		auto &ORE = OREGetter(F);

ORE.emit([&]() {		ORE.emit([&]() {
return RemarkCB(RemarkKind(DEBUG_TYPE, RemarkName, Inst));		return RemarkCB(RemarkKind(DEBUG_TYPE, RemarkName, Inst));
});		});
}		}

▲ Show 20 Lines • Show All 141 Lines • Show Last 20 Lines

llvm/test/Transforms/OpenMP/constant_thread_count_analysis.ll

This file was added.

				; RUN: opt -openmpopt -pass-remarks-analysis=openmp-opt -disable-output < %s 2>&1 \| FileCheck %s
				; RUN: opt -passes=openmpopt -pass-remarks-analysis=openmp-opt -disable-output < %s 2>&1 \| FileCheck %s
				; ModuleID = 'constant_thread_count_analysis.ll'
				source_filename = "constant_thread_count_analysis.c"
				target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-pc-linux-gnu"

				%struct.ident_t = type { i32, i32, i32, i32, i8* }

				;
				; extern void do_work();
				;
				; void constant(void) {
				; #pragma omp parallel num_threads(2)
				; { do_work(); }
				; }
				;
				; void variable(int n) {
				; #pragma omp parallel num_threads(n)
				; { do_work(); }
				; }
				;

				@.str = private unnamed_addr constant [23 x i8] c";unknown;unknown;0;0;;\00", align 1
				@0 = private unnamed_addr constant %struct.ident_t { i32 0, i32 2, i32 0, i32 0, i8* getelementptr inbounds ([23 x i8], [23 x i8]* @.str, i32 0, i32 0) }, align 8

				; CHECK: remark: constant_thread_count_analysis.c:6:5: Use of OpenMP parallel region with a constant (2) number of threads is not future compatible. Consider using a scalable thread count, or remove the 'num_threads' clause.
				define dso_local void @constant() local_unnamed_addr !dbg !13 {
				%1 = call i32 @__kmpc_global_thread_num(%struct.ident_t* nonnull @0)
				call void @__kmpc_push_num_threads(%struct.ident_t* nonnull @0, i32 %1, i32 2), !dbg !16
				call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t* nonnull @0, i32 0, void (i32, i32, ...)* bitcast (void (i32, i32)* @.omp_outlined. to void (i32, i32, ...)*)), !dbg !16
				ret void, !dbg !17
				}

				define void @variable(i32 %0) local_unnamed_addr !dbg !18 {
				%2 = call i32 @__kmpc_global_thread_num(%struct.ident_t* nonnull @0)
				call void @__kmpc_push_num_threads(%struct.ident_t* nonnull @0, i32 %2, i32 %0), !dbg !24
				call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t* nonnull @0, i32 0, void (i32, i32, ...)* bitcast (void (i32, i32)* @.omp_outlined..2 to void (i32, i32, ...)*)), !dbg !24
				ret void, !dbg !25
				}

				define void @.omp_outlined._debug__() !dbg !26 {
				call void (...) @do_work(), !dbg !36
				ret void, !dbg !38
				}

				define void @.omp_outlined.(i32* %0, i32* %1) !dbg !39 {
				call void @.omp_outlined._debug__(), !dbg !43
				ret void, !dbg !43
				}

				define void @.omp_outlined._debug__.1() unnamed_addr !dbg !44 {
				call void (...) @do_work(), !dbg !48
				ret void, !dbg !50
				}

				define void @.omp_outlined..2(i32* noalias nocapture readnone %0, i32* noalias nocapture readnone %1) !dbg !51 {
				call void @.omp_outlined._debug__.1(), !dbg !55
				ret void, !dbg !55
				}

				declare !dbg !4 void @do_work(...)

				declare i32 @__kmpc_global_thread_num(%struct.ident_t*)

				declare void @__kmpc_push_num_threads(%struct.ident_t*, i32, i32)

				declare !callback !56 void @__kmpc_fork_call(%struct.ident_t, i32, void (i32, i32, ...), ...)

				!llvm.dbg.cu = !{!0}
				!llvm.module.flags = !{!7, !8, !9, !10, !11}
				!llvm.ident = !{!12}

				!0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 10.0.0 ", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !2, retainedTypes: !3, splitDebugInlining: false, nameTableKind: None)
				!1 = !DIFile(filename: "constant_thread_count_analysis.c", directory: "/tmp")
				!2 = !{}
				!3 = !{!4}
				!4 = !DISubprogram(name: "do_work", scope: !1, file: !1, line: 3, type: !5, spFlags: DISPFlagOptimized, retainedNodes: !2)
				!5 = !DISubroutineType(types: !6)
				!6 = !{null, null}
				!7 = !{i32 7, !"Dwarf Version", i32 4}
				!8 = !{i32 2, !"Debug Info Version", i32 3}
				!9 = !{i32 1, !"wchar_size", i32 4}
				!10 = !{i32 7, !"PIC Level", i32 2}
				!11 = !{i32 7, !"PIE Level", i32 2}
				!12 = !{!"clang version 10.0.0 "}
				!13 = distinct !DISubprogram(name: "constant", scope: !1, file: !1, line: 5, type: !14, scopeLine: 5, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !0, retainedNodes: !2)
				!14 = !DISubroutineType(types: !15)
				!15 = !{null}
				!16 = !DILocation(line: 6, column: 5, scope: !13)
				!17 = !DILocation(line: 8, column: 1, scope: !13)
				!18 = distinct !DISubprogram(name: "variable", scope: !1, file: !1, line: 10, type: !19, scopeLine: 10, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !0, retainedNodes: !22)
				!19 = !DISubroutineType(types: !20)
				!20 = !{null, !21}
				!21 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
				!22 = !{!23}
				!23 = !DILocalVariable(name: "n", arg: 1, scope: !18, file: !1, line: 10, type: !21)
				!24 = !DILocation(line: 11, column: 5, scope: !18)
				!25 = !DILocation(line: 13, column: 1, scope: !18)
				!26 = distinct !DISubprogram(name: ".omp_outlined._debug__", scope: !1, file: !1, line: 7, type: !27, scopeLine: 7, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagLocalToUnit \| DISPFlagDefinition \| DISPFlagOptimized, unit: !0, retainedNodes: !33)
				!27 = !DISubroutineType(types: !28)
				!28 = !{null, !29, !29}
				!29 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !30)
				!30 = !DIDerivedType(tag: DW_TAG_restrict_type, baseType: !31)
				!31 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !32, size: 64)
				!32 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !21)
				!33 = !{!34, !35}
				!34 = !DILocalVariable(name: ".global_tid.", arg: 1, scope: !26, type: !29, flags: DIFlagArtificial)
				!35 = !DILocalVariable(name: ".bound_tid.", arg: 2, scope: !26, type: !29, flags: DIFlagArtificial)
				!36 = !DILocation(line: 7, column: 7, scope: !37)
				!37 = distinct !DILexicalBlock(scope: !26, file: !1, line: 7, column: 5)
				!38 = !DILocation(line: 7, column: 18, scope: !26)
				!39 = distinct !DISubprogram(name: ".omp_outlined.", scope: !1, file: !1, line: 7, type: !27, scopeLine: 7, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagLocalToUnit \| DISPFlagDefinition \| DISPFlagOptimized, unit: !0, retainedNodes: !40)
				!40 = !{!41, !42}
				!41 = !DILocalVariable(name: ".global_tid.", arg: 1, scope: !39, type: !29, flags: DIFlagArtificial)
				!42 = !DILocalVariable(name: ".bound_tid.", arg: 2, scope: !39, type: !29, flags: DIFlagArtificial)
				!43 = !DILocation(line: 7, column: 5, scope: !39)
				!44 = distinct !DISubprogram(name: ".omp_outlined._debug__.1", scope: !1, file: !1, line: 12, type: !27, scopeLine: 12, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagLocalToUnit \| DISPFlagDefinition \| DISPFlagOptimized, unit: !0, retainedNodes: !45)
				!45 = !{!46, !47}
				!46 = !DILocalVariable(name: ".global_tid.", arg: 1, scope: !44, type: !29, flags: DIFlagArtificial)
				jdoerfertUnsubmitted Not Done Reply Inline Actions Don't think we need this comment ;) jdoerfert: Don't think we need this comment ;)
				!47 = !DILocalVariable(name: ".bound_tid.", arg: 2, scope: !44, type: !29, flags: DIFlagArtificial)
				!48 = !DILocation(line: 12, column: 7, scope: !49)
				!49 = distinct !DILexicalBlock(scope: !44, file: !1, line: 12, column: 5)
				!50 = !DILocation(line: 12, column: 18, scope: !44)
				!51 = distinct !DISubprogram(name: ".omp_outlined..2", scope: !1, file: !1, line: 12, type: !27, scopeLine: 12, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagLocalToUnit \| DISPFlagDefinition \| DISPFlagOptimized, unit: !0, retainedNodes: !52)
				!52 = !{!53, !54}
				!53 = !DILocalVariable(name: ".global_tid.", arg: 1, scope: !51, type: !29, flags: DIFlagArtificial)
				!54 = !DILocalVariable(name: ".bound_tid.", arg: 2, scope: !51, type: !29, flags: DIFlagArtificial)
				!55 = !DILocation(line: 12, column: 5, scope: !51)
				!56 = !{!57}
				!57 = !{i64 2, i64 -1, i64 -1, i1 true}

llvm/test/Transforms/OpenMP/shared_firstprivate_analysis.ll

This file was added.

				; RUN: opt -openmpopt -pass-remarks-analysis=openmp-opt -disable-output < %s 2>&1 \| FileCheck %s
				; RUN: opt -passes=openmpopt -pass-remarks-analysis=openmp-opt -disable-output < %s 2>&1 \| FileCheck %s
				; ModuleID = 'shared_firstprivate_analysis.c'
				source_filename = "shared_firstprivate_analysis.c"
				target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-pc-linux-gnu"

				;
				; extern void do_work(int);
				;
				; void shared (int i) {
				; #pragma omp parallel shared(i)
				; { do_work(i); }
				; }
				jdoerfertUnsubmitted Not Done Reply Inline Actions We need to reference the value somehow. Where does this point to now, I mean what is in line 7? Could we include the C source in a comment at the top? We probably want to issue 2 remarks, one for the value, with the value debug info, and one for the directive. jdoerfert: We need to reference the value somehow. Where does this point to now, I mean what is in line 7?
				jhuber6AuthorUnsubmitted Done Reply Inline Actions I can go ahead and add the source. To print out the value it refers to I'd probably need to the get the value from the `__kmpc_fork_call` since the remark is generated from looking at the arguments to the `.omp_outlined` function which doesn't know anything about the actual value. Is it safe to assume that the arguments are in the same order here? Like the third argument to `.omp_outlined` is always the first vararg passed to the callback? jhuber6: I can go ahead and add the source. To print out the value it refers to I'd probably need to the…
				;
				; void firstprivate (int i) {
				; #pragma omp parallel firstprivate(i)
				; { do_work(i); }
				; }
				;

				%struct.ident_t = type { i32, i32, i32, i32, i8* }

				@.str = private unnamed_addr constant [23 x i8] c";unknown;unknown;0;0;;\00", align 1
				@0 = private unnamed_addr constant %struct.ident_t { i32 0, i32 2, i32 0, i32 0, i8* getelementptr inbounds ([23 x i8], [23 x i8]* @.str, i32 0, i32 0) }, align 8

				; CHECK: remark: shared_firstprivate_analysis.c:7:5: Use of OpenMP shared clause with a value that should likely be pass-by-value. Consider using the firstprivate clause instead
				define dso_local void @shared(i32 %0) local_unnamed_addr #0 !dbg !14 {
				%2 = alloca i32, align 4
				call void @llvm.dbg.value(metadata i32 %0, metadata !16, metadata !DIExpression()), !dbg !17
				store i32 %0, i32* %2, align 4, !tbaa !18
				call void @llvm.dbg.value(metadata i32* %2, metadata !16, metadata !DIExpression(DW_OP_deref)), !dbg !17
				call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t* nonnull @0, i32 1, void (i32, i32, ...)* bitcast (void (i32, i32, i32) @.omp_outlined. to void (i32, i32, ...)), i32 nonnull %2) #5, !dbg !22
				ret void, !dbg !26
				}

				declare !dbg !4 void @do_work(i32) local_unnamed_addr #1

				define internal void @.omp_outlined.(i32* noalias nocapture readnone %0, i32* noalias nocapture readnone %1, i32* nocapture readonly dereferenceable(4) %2) #2 !dbg !27 {
				%4 = load i32, i32* %2, align 4, !dbg !40, !tbaa !18
				tail call void @do_work(i32 %4) #5, !dbg !48
				ret void, !dbg !40
				}

				declare !callback !50 void @__kmpc_fork_call(%struct.ident_t, i32, void (i32, i32, ...), ...) local_unnamed_addr

				declare void @llvm.dbg.value(metadata, metadata, metadata)

				define dso_local void @firstprivate(i32 %0) local_unnamed_addr #0 !dbg !52 {
				%2 = zext i32 %0 to i64, !dbg !56
				call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t* nonnull @0, i32 1, void (i32, i32, ...)* bitcast (void (i32, i32, i64)* @.omp_outlined..2 to void (i32, i32, ...)*), i64 %2) #5, !dbg !56
				ret void, !dbg !57
				}

				define internal void @.omp_outlined..2(i32* noalias nocapture readnone %0, i32* noalias nocapture readnone %1, i64 %2) #2 !dbg !58 {
				%4 = trunc i64 %2 to i32
				tail call void @do_work(i32 %4) #5, !dbg !76
				ret void, !dbg !78
				}

				!llvm.dbg.cu = !{!0}
				!llvm.module.flags = !{!8, !9, !10, !11, !12}
				!llvm.ident = !{!13}

				!0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 10.0.0 ", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !2, retainedTypes: !3, splitDebugInlining: false, nameTableKind: None)
				!1 = !DIFile(filename: "shared_firstprivate_analysis.c", directory: "/tmp")
				!2 = !{}
				!3 = !{!4}
				!4 = !DISubprogram(name: "do_work", scope: !1, file: !1, line: 3, type: !5, flags: DIFlagPrototyped, spFlags: DISPFlagOptimized, retainedNodes: !2)
				!5 = !DISubroutineType(types: !6)
				!6 = !{null, !7}
				!7 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
				!8 = !{i32 7, !"Dwarf Version", i32 4}
				!9 = !{i32 2, !"Debug Info Version", i32 3}
				!10 = !{i32 1, !"wchar_size", i32 4}
				!11 = !{i32 7, !"PIC Level", i32 2}
				!12 = !{i32 7, !"PIE Level", i32 2}
				!13 = !{!"clang version 10.0.0 "}
				!14 = distinct !DISubprogram(name: "shared", scope: !1, file: !1, line: 6, type: !5, scopeLine: 6, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !0, retainedNodes: !15)
				!15 = !{!16}
				!16 = !DILocalVariable(name: "i", arg: 1, scope: !14, file: !1, line: 6, type: !7)
				!17 = !DILocation(line: 0, scope: !14)
				!18 = !{!19, !19, i64 0}
				!19 = !{!"int", !20, i64 0}
				!20 = !{!"omnipotent char", !21, i64 0}
				!21 = !{!"Simple C/C++ TBAA"}
				!22 = !DILocation(line: 7, column: 5, scope: !14)
				!23 = !{!24, !25, i64 16}
				!24 = !{!"ident_t", !19, i64 0, !19, i64 4, !19, i64 8, !19, i64 12, !25, i64 16}
				!25 = !{!"any pointer", !20, i64 0}
				!26 = !DILocation(line: 9, column: 1, scope: !14)
				!27 = distinct !DISubprogram(name: ".omp_outlined.", scope: !1, file: !1, line: 8, type: !28, scopeLine: 8, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagLocalToUnit \| DISPFlagDefinition \| DISPFlagOptimized, unit: !0, retainedNodes: !35)
				!28 = !DISubroutineType(types: !29)
				!29 = !{null, !30, !30, !34}
				!30 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !31)
				!31 = !DIDerivedType(tag: DW_TAG_restrict_type, baseType: !32)
				!32 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !33, size: 64)
				!33 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !7)
				!34 = !DIDerivedType(tag: DW_TAG_reference_type, baseType: !7, size: 64)
				!35 = !{!36, !37, !38}
				!36 = !DILocalVariable(name: ".global_tid.", arg: 1, scope: !27, type: !30, flags: DIFlagArtificial)
				!37 = !DILocalVariable(name: ".bound_tid.", arg: 2, scope: !27, type: !30, flags: DIFlagArtificial)
				!38 = !DILocalVariable(name: "i", arg: 3, scope: !27, type: !34, flags: DIFlagArtificial)
				!39 = !DILocation(line: 0, scope: !27)
				!40 = !DILocation(line: 8, column: 5, scope: !27)
				!41 = !DILocalVariable(name: ".global_tid.", arg: 1, scope: !42, type: !30, flags: DIFlagArtificial)
				!42 = distinct !DISubprogram(name: ".omp_outlined._debug__", scope: !1, file: !1, line: 8, type: !28, scopeLine: 8, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagLocalToUnit \| DISPFlagDefinition \| DISPFlagOptimized, unit: !0, retainedNodes: !43)
				!43 = !{!41, !44, !45}
				!44 = !DILocalVariable(name: ".bound_tid.", arg: 2, scope: !42, type: !30, flags: DIFlagArtificial)
				!45 = !DILocalVariable(name: "i", arg: 3, scope: !42, file: !1, line: 6, type: !34)
				!46 = !DILocation(line: 0, scope: !42, inlinedAt: !47)
				!47 = distinct !DILocation(line: 8, column: 5, scope: !27)
				!48 = !DILocation(line: 8, column: 7, scope: !49, inlinedAt: !47)
				!49 = distinct !DILexicalBlock(scope: !42, file: !1, line: 8, column: 5)
				!50 = !{!51}
				!51 = !{i64 2, i64 -1, i64 -1, i1 true}
				!52 = distinct !DISubprogram(name: "firstprivate", scope: !1, file: !1, line: 11, type: !5, scopeLine: 11, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !0, retainedNodes: !53)
				!53 = !{!54}
				!54 = !DILocalVariable(name: "i", arg: 1, scope: !52, file: !1, line: 11, type: !7)
				!55 = !DILocation(line: 0, scope: !52)
				!56 = !DILocation(line: 12, column: 5, scope: !52)
				!57 = !DILocation(line: 14, column: 1, scope: !52)
				!58 = distinct !DISubprogram(name: ".omp_outlined..2", scope: !1, file: !1, line: 13, type: !59, scopeLine: 13, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagLocalToUnit \| DISPFlagDefinition \| DISPFlagOptimized, unit: !0, retainedNodes: !62)
				!59 = !DISubroutineType(types: !60)
				!60 = !{null, !30, !30, !61}
				!61 = !DIBasicType(name: "long unsigned int", size: 64, encoding: DW_ATE_unsigned)
				jhuber6AuthorUnsubmitted Done Reply Inline Actions Should I combine this into one giant file that just handles OpenMP Analysis stuff? jhuber6: Should I combine this into one giant file that just handles OpenMP Analysis stuff?
				jdoerfertUnsubmitted Not Done Reply Inline Actions If there is no inherent benefit I would not merge them. Having more selective tests is usually better. You can prefix their names to make it easier to categorize them (instead of the current postfix), or put them into a subfolder. jdoerfert: If there is no inherent benefit I would not merge them. Having more selective tests is usually…
				!62 = !{!63, !64, !65}
				!63 = !DILocalVariable(name: ".global_tid.", arg: 1, scope: !58, type: !30, flags: DIFlagArtificial)
				!64 = !DILocalVariable(name: ".bound_tid.", arg: 2, scope: !58, type: !30, flags: DIFlagArtificial)
				!65 = !DILocalVariable(name: "i", arg: 3, scope: !58, type: !61, flags: DIFlagArtificial)
				!66 = !DILocation(line: 0, scope: !58)
				!67 = !DILocalVariable(name: ".global_tid.", arg: 1, scope: !68, type: !30, flags: DIFlagArtificial)
				!68 = distinct !DISubprogram(name: ".omp_outlined._debug__.1", scope: !1, file: !1, line: 13, type: !69, scopeLine: 13, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagLocalToUnit \| DISPFlagDefinition \| DISPFlagOptimized, unit: !0, retainedNodes: !71)
				!69 = !DISubroutineType(types: !70)
				!70 = !{null, !30, !30, !7}
				!71 = !{!67, !72, !73}
				!72 = !DILocalVariable(name: ".bound_tid.", arg: 2, scope: !68, type: !30, flags: DIFlagArtificial)
				!73 = !DILocalVariable(name: "i", arg: 3, scope: !68, file: !1, line: 11, type: !7)
				!74 = !DILocation(line: 0, scope: !68, inlinedAt: !75)
				!75 = distinct !DILocation(line: 13, column: 5, scope: !58)
				!76 = !DILocation(line: 13, column: 7, scope: !77, inlinedAt: !75)
				!77 = distinct !DILexicalBlock(scope: !68, file: !1, line: 13, column: 5)
				!78 = !DILocation(line: 13, column: 5, scope: !58)

This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Begin Adding Analysis Remarks for OpenMP Best Practises.Needs ReviewPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 269935

llvm/lib/Transforms/IPO/OpenMPOpt.cpp

llvm/test/Transforms/OpenMP/constant_thread_count_analysis.ll

llvm/test/Transforms/OpenMP/shared_firstprivate_analysis.ll

[OpenMP] Begin Adding Analysis Remarks for OpenMP Best Practises.
Needs ReviewPublic