This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Frontend/OpenMP/
-
llvm/
-
Frontend/
-
OpenMP/
1/3
OMPConstants.h
2
OMPIRBuilder.h
-
lib/Frontend/OpenMP/
-
Frontend/
-
OpenMP/
24/28
OMPIRBuilder.cpp
-
unittests/Frontend/
-
Frontend/
7/9
OpenMPIRBuilderTest.cpp
-
mlir/lib/Target/LLVMIR/Dialect/OpenMP/
-
lib/
-
Target/
-
LLVMIR/
-
Dialect/
-
OpenMP/
-
OpenMPToLLVMIRTranslation.cpp

Differential D97393

[OpenMP IRBuilder, MLIR] Add support for OpenMP do schedule dynamic
ClosedPublic

Authored by Leporacanthicus on Feb 24 2021, 8:18 AM.

Download Raw Diff

Details

Reviewers

Meinersbur
jdoerfert
fghanim
ftynse
kiranktp
SouraVX
kiranchandramohan

Commits

rG517c3aee4de5: [OpenMP IRBuilder, MLIR] Add support for OpenMP do schedule dynamic

Summary

The implementation supports static schedule for Fortran do loops. This
implements the dynamic variant of the same concept.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Leporacanthicus created this revision.Feb 24 2021, 8:18 AM

Herald added a reviewer: ftynse. · View Herald TranscriptFeb 24 2021, 8:18 AM

Herald added subscribers: teijeong, rdzhabarov, tatianashp and 17 others. · View Herald Transcript

Leporacanthicus requested review of this revision.Feb 24 2021, 8:18 AM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptFeb 24 2021, 8:18 AM

Herald added subscribers: llvm-commits, sstefan1, stephenneuendorffer, nicolasvasilache. · View Herald Transcript

Please note that this is not intended as the final commit, but rather a basis for asking some advice on how to move forward.

The main stumbling point, which may be my lack of understanding of what it's supposed to do: the CanonicalLoopInfo assumes that the the cond block has a CmpInst as the first instruction. In the dynamic, the corresponding block [in my understanding] starts with a call instruction to fetch the "next" set of data to process. This causes the assertOK to fail, hence it is commented out on line 1300 in the patch.

kiranchandramohan added reviewers: kiranktp, SouraVX, kiranchandramohan.Feb 24 2021, 8:27 AM

SouraVX added a project: Restricted Project.Feb 24 2021, 8:29 AM

Could you please add a Unit test for this(as we don't have clang, flang, or MLIR interfacing for dynamic workshare loops).

kiranchandramohan retitled this revision from Add support for OpenMP do schedule dynamic to [OpenMP IRBuilder] Add support for OpenMP do schedule dynamic.Feb 24 2021, 9:05 AM

jdoerfert added inline comments.Feb 24 2021, 9:26 AM

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
1217	I think we sue doxygen comments `///` for static functions as well.
1335	I don't know what this does but it looks brittle. We should have handles for all values, e.g., "the old condition", which we can remove is necessary by following the handle and the operands (with a single use).

kiranchandramohan retitled this revision from [OpenMP IRBuilder] Add support for OpenMP do schedule dynamic to [OpenMP IRBuilder, MLIR] Add support for OpenMP do schedule dynamic.Feb 24 2021, 9:37 AM

Harbormaster completed remote builds in B90620: Diff 326102.Feb 24 2021, 9:48 AM

Meinersbur added inline comments.Feb 24 2021, 10:27 AM

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
1250–1251	The InsertionPointTy of `Loc` is not used with methods that apply to a canonical loop. Consider passing only the DebugLoc. The same seem to apply to `createStaticWorkshareLoop`.
1262–1281	Since these are shared with `createStaticWorkshareLoop`, did you consider extractin it into a common function?
1297–1298	Usually the name of the parameter comes before the argument

Leporacanthicus added inline comments.Feb 24 2021, 10:52 AM

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
1217	No problem, I will update - although the other functions in this file aren't using that style (see line 1054-1057) - should I fix all of them (in a separate commit, I expect?) or just the two new functions?
1335	Agreed. There needs to be a better way to do this... ;) What it "does" is: It removes the "instructions after the ones I added" . What I would like to do is actually replace the existing `cond` with a new one[1] (and assuming nothing else uses that original block, remove it). There's no good way to do that with the current interface on CanonicalLoopInfo, so I did this to at least get me to a BasicBlock that doesn't get rejected for having two different terminations (is that the right term?). [1] That's what I think I want to do. My understanding, and this is mainly based on what clang++ generates for "the same" C++ code to the Fortran code I'm experimenting with here, is that my `cond` BB should contain a call to the `kmpc_dispatch_next` and check the return value from that, and either exit or keep going - I may be completely wrong in this understanding. The original code, that I copied from the static variant is doing a compare to the full loop count, but in `dynamic` the work is not necessarily finished in order, happy to be corrected on these things, I'm new to both Fortran and OpenMP,

In D97393#2585126, @Leporacanthicus wrote:

The main stumbling point, which may be my lack of understanding of what it's supposed to do: the CanonicalLoopInfo assumes that the the cond block has a CmpInst as the first instruction. In the dynamic, the corresponding block [in my understanding] starts with a call instruction to fetch the "next" set of data to process. This causes the assertOK to fail, hence it is commented out on line 1300 in the patch.

After applying worksharing-loop, the result is not a canonical loop nest anymore (one reason that the number of iterations is not known in advance). Hence, the CanonicalLoopInfo structure does not need to be preserved[*].

(*) I am working on making the chunk-loop accessible for loop transformations in OpenMP 6.0, which would require representing it as a CanonicalLoopInfo.

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
1283–1284	With a Chunk-size that is not one, you need at least two loops: One for all iterations of a chunk and another surrounding while-loop that ask the runtime for the next chunk. This two-loop structure should also apply to the static schedule if the chunk is set and is greater than one. It don't see this in `createStaticWorkshareLoop`, it might be broken for these cases.
1303	With dynamic schedule, the trip count is not known it advance.

Updated files based on selected review commets.
Fixed a few issues with the code-generation.
Added a basic test (similar to static workshare loop.

Herald added subscribers: dcaballe, cota. · View Herald TranscriptMar 11 2021, 9:46 AM

Meinersbur added inline comments.Mar 11 2021, 2:43 PM

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
1257–1258	The InsertionPoint part is ignored anyway. It would make more sense to only pass the DebugLoc, like `tileLoops`.
1293–1296	For `schedule(dynamic)`, the default chunk size is indeed one.
1300–1301	The could just be added to `OMPConstants.h`, not necessary to involve tablegen. `kmp_sched` in `kmp.h` is already redundant with `omp_sched` from omp.h. To not introduce dependency issues atm, I suggest to just reproduce it them `OMPConstants.h` with a comment to remarks they have to keep the in sync. However, I found that `libomptarget/plugins/amdgpu/src/rtl.cpp` already includes from libLLVMFrontend.
1329	[style] LLVM's coding style does not use "Almost Always Auto" Consider using `getHeder()->front()`
1330–1334	`cast` already contains an assertion if the argument is not the casted-to type.
1345–1346	Comment is outdated? No CanonicalLoopInfo needs to be preserved.
1382–1384	Consider extracting all the info you need from CanonicalLoopInfo at the beginning and then abandoning the structure, since starting from the first CFG modification, it does not describe a canonical loop anymore but methods such as `getAfterIP()` may assume so.

Harbormaster completed remote builds in B93332: Diff 330002.Mar 11 2021, 4:18 PM

In D97393#2585126, @Leporacanthicus wrote:

Please note that this is not intended as the final commit, but rather a basis for asking some advice on how to move forward.

The main stumbling point, which may be my lack of understanding of what it's supposed to do: the CanonicalLoopInfo assumes that the the cond block has a CmpInst as the first instruction. In the dynamic, the corresponding block [in my understanding] starts with a call instruction to fetch the "next" set of data to process. This causes the assertOK to fail, hence it is commented out on line 1300 in the patch.

This is no longer a "not intented as final commit". I'm not saying it's perfect, but review comments would be useful.

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
1250–1251	It is used (but not really NECESSARY to use!) for to get the debug location in `getOrCreateSrcLocStr` a couple of lines down, which means more changes - I will look at doing this as a separate change, once I have something that works.
1262–1281	Yes, definitely on my mind. I was just concentrating on "get something that works first, then refactor" - otherwise, I find myself refactoring, and then reverting half of that, because it was the "wrong thing"... ;)

Just a quick comment. I will review in detail later.

This patch connects MLIR lowering to LLVM IR using the OpenMPIRBuilder. We should add a test for that flow or remove that connection in this patch.

Once the pretty-printer and parser patch (https://reviews.llvm.org/D92327) lands, it will become easier to write tests.

Meinersbur added inline comments.Mar 15 2021, 10:00 AM

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
1250–1251	getOrCreateSrcLocStr uses the InsertionPoint only as backup for the function name. Since it returns a default location when DebugLoc is not available, I don't see in what circumstances it would ever be used. In any case, there should be a version of getOrCreateSrcLocStr that only needs a DebugLoc and possibly a `llvm::Function` which can be obtained from the CanonicalLoopInfo stored BasicBlocks. I am OK with doing it with a separate change, I was considering myself already.
1262–1281	This is fine for your personal workflow, but for committing a patch we should aim for clean source code in main. For instance, I often use `auto` locally but to comply to the LLVM coding standard, I have to replace most of them with the actual type.

I'm going to upload the fixes once they compile, but it's getting late, so probably not until tomorrow morning.

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
1283–1284	It appears, from what I can tell, we ALWAYS get one in the chunk-size. I will try to fix this in a bit. I have hand-coded different values in the LLVM-IR to test that the tests I have [written in Fortran] to check that it behaves correctly. Proper tests written in either MLIR or Fortran would be needed for this in the future, and tests that check for example chunk size arrives to this section correctly.
1300–1301	I have moved a minimal set of constants.
1345–1346	Yes, and since we're adding zero to IV, it's not much point in updating it, so I removed the whole block below too - as far as I can tell, it produces the same result.

Fixed various review comments.

Harbormaster completed remote builds in B94037: Diff 330964.Mar 16 2021, 7:19 AM

Meinersbur added inline comments.Mar 18 2021, 8:23 AM

llvm/include/llvm/Frontend/OpenMP/OMPConstants.h
109–121	IMHO we should copy&paste the kmp_schedule enum here entirely, with a comment that it must be kept in sync. Eventually `kmp.h` should include `OMPConstants.h` and use the declaration here instead of declaring its own.
llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
1283–1284	Consider adding a test to OpenMPIRBuilderTest.cpp that inserts a non-one chunk size, or even a non-constant chunk-size.
1340	Typo?

Fixes as per review comments

Leporacanthicus added inline comments.Mar 22 2021, 11:32 AM

llvm/include/llvm/Frontend/OpenMP/OMPConstants.h
109–121	I have added a comment to say that this needs to match kmp.h I did look at copying the whole set of enum values, but I think that's better done at a later stage - I don't feel I understand exactly how these things are being used, and that would be key to how they get organized in a move. [Yes, could just copy the whole thing, but I'm not convinced that is the BEST choice - there appears to be a secondary hierarchy in there]. I will revisit this in a future patch, after I have spent some time understanding all the uses of these enum values.

Harbormaster completed remote builds in B95049: Diff 332375.Mar 22 2021, 12:29 PM

Add testing for chunk size in dynamic work sharing loop test.

Also removed a superfluous assignment in the production code.

Harbormaster completed remote builds in B96281: Diff 334104.Mar 30 2021, 4:54 AM

Meinersbur added inline comments.Mar 31 2021, 1:12 PM

llvm/include/llvm/Frontend/OpenMP/OMPConstants.h
109–121	Sounds reasonable.
llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
377	[suggestion] Add a newline between the description and the beginning of params. Not sure whether it makes a difference for doxygen, but clang-format likes to redo paragraph line wrapping and would include \param into the paragraph.
llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
1313	To avoid creating temporary strings.
1340	"Rejig" is not a typo? Wiktionary knows it, but UK-only.
llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp
1734	I assumed you would add a second test when chunk is non-default. However, the default chunk size of 1 is not special, so I am find with this.
1741–1744	[nit] Can use `auto *` when using `dyn_cast` on the same line. At least, please use consistent style in the same function,
1780–1782	You could just use `cast` instead of `dyn_cast` which asserts if it is the wrong type, so you do not need explicitly check for nullptr.

Updates requested in review:

Avoid making temporary string
Avoid British English in comment
Be consistent and not mixing auto/named type in new tests

Harbormaster completed remote builds in B96753: Diff 334732.Apr 1 2021, 10:54 AM

Leporacanthicus added inline comments.Apr 1 2021, 11:07 AM

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
1340	Changing to "modify" to be more universal. Living in the UK, I'm not always aware of what English words are "UK" only and which are "any English variety".
llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp
1734	I am in the process of adding fortran-compiler tests that do dynamic with no chunk size and static with a given chunk size, to cover more - but won't go into llvm main for a while, I guess.
1741–1744	The whole file is inconsistent in this respect. First dyn_cast in the file is to a named type (line 156), the second one is to an auto (line 161), but then it is MOSTLY named types, with a scattering of auto in the original static code. I'm changing the ones below to use non-auto, to match the rest of the file.
1780–1782	Again, this matches the rest of the file - there is a small number of `cast<X>(y)`, but they are all preceded by an `ASSERT_TRUE(isa<X>(y))`, which is about the same as this construct - the `isa<X>` check may explain more clearly what's wrong. but I doubt anyone working on LLVM for more than a few days will struggle to understand why a `dyn_cast` returned a `nullptr`. I think the assert you get inside the `cast<X>` is less clear - if nothing else, in the sense that you probably can't set a breakpoint and then inspect what happened in a good way.

Could you add the following to the unittest?

Builder.restoreIP(EndIP);
Builder.CreateRetVoid();
OMPBuilder.finalize();
EXPECT_FALSE(verifyModule(*M, &errs()));

I checks whether the IR is internally consistent.

llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp
1736–1737	Do not use `CLI` after it has been invalidated.
1741–1744	Reviews of previous patches were not necessirily perfect ;-(

For reference, the generation control flow:

Updates as per review comments:

Fetch from CLI before it is destroyed.
Add return to terminate block and then verifyModule

In D97393#2666954, @Meinersbur wrote:
Could you add the following to the unittest?
Builder.restoreIP(EndIP);
Builder.CreateRetVoid();
OMPBuilder.finalize();
EXPECT_FALSE(verifyModule(*M, &errs()));
I checks whether the IR is internally consistent.

Good spot. Done!

llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp
1741–1744	Indeed. Problem comes when you copy-paste code from previously "not perfect reviews". I've been on the other end of this too. ;)

Harbormaster completed remote builds in B97681: Diff 336047.Apr 8 2021, 3:54 AM

LGTM, thank you.

Do you have commit right to push it by yourself?

This revision is now accepted and ready to land.Apr 15 2021, 4:02 PM

In D97393#2693200, @Meinersbur wrote:

Do you have commit right to push it by yourself?

Thanks @Meinersbur. I will submit this for Mats.

llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
382	Nit: insterted -> inserted

Closed by commit rG517c3aee4de5: [OpenMP IRBuilder, MLIR] Add support for OpenMP do schedule dynamic (authored by MatsPetersson, committed by kiranchandramohan). · Explain WhyApr 16 2021, 8:10 AM

This revision was automatically updated to reflect the committed changes.

kiranchandramohan added a commit: rG517c3aee4de5: [OpenMP IRBuilder, MLIR] Add support for OpenMP do schedule dynamic.

Revision Contents

Path

Size

llvm/

include/

llvm/

Frontend/

OpenMP/

OMPConstants.h

11 lines

OMPIRBuilder.h

26 lines

lib/

Frontend/

OpenMP/

OMPIRBuilder.cpp

150 lines

unittests/

Frontend/

OpenMPIRBuilderTest.cpp

99 lines

mlir/

lib/

Target/

LLVMIR/

Dialect/

OpenMP/

OpenMPToLLVMIRTranslation.cpp

30 lines

Diff 338123

llvm/include/llvm/Frontend/OpenMP/OMPConstants.h

	Show First 20 Lines • Show All 100 Lines • ▼ Show 20 Lines
	};			};

	inline std::string getAllAssumeClauseOptions() {			inline std::string getAllAssumeClauseOptions() {
	std::string S;			std::string S;
	for (const AssumptionClauseMappingInfo &ACMI : AssumptionClauseMappings)			for (const AssumptionClauseMappingInfo &ACMI : AssumptionClauseMappings)
	S += (S.empty() ? "'" : "', '") + ACMI.Identifier.str();			S += (S.empty() ? "'" : "', '") + ACMI.Identifier.str();
	return S + "'";			return S + "'";
	}			}

				/// \note This needs to be kept in sync with kmp.h enum sched_type.
				/// Todo: Update kmp.h to include this file, and remove the enums in kmp.h
				/// To complete this, more enum values will need to be moved here.
				enum class OMPScheduleType {
				Static = 34, /*< static unspecialized /
				DynamicChunked = 35,
				ModifierNonmonotonic =
				(1 << 30), /*< Set if the nonmonotonic schedule modifier was present /
				LLVM_MARK_AS_BITMASK_ENUM(/* LargestValue */ ModifierNonmonotonic)
				};

	} // end namespace omp			} // end namespace omp
				MeinersburUnsubmitted Not Done Reply Inline Actions IMHO we should copy&paste the kmp_schedule enum here entirely, with a comment that it must be kept in sync. Eventually `kmp.h` should include `OMPConstants.h` and use the declaration here instead of declaring its own. Meinersbur: IMHO we should copy&paste the kmp_schedule enum here entirely, with a comment that it must be…
				LeporacanthicusAuthorUnsubmitted Done Reply Inline Actions I have added a comment to say that this needs to match kmp.h I did look at copying the whole set of enum values, but I think that's better done at a later stage - I don't feel I understand exactly how these things are being used, and that would be key to how they get organized in a move. [Yes, could just copy the whole thing, but I'm not convinced that is the BEST choice - there appears to be a secondary hierarchy in there]. I will revisit this in a future patch, after I have spent some time understanding all the uses of these enum values. Leporacanthicus: I have added a comment to say that this needs to match kmp.h I did look at copying the whole…
				MeinersburUnsubmitted Not Done Reply Inline Actions Sounds reasonable. Meinersbur: Sounds reasonable.

	} // end namespace llvm			} // end namespace llvm

	#endif // LLVM_FRONTEND_OPENMP_OMPCONSTANTS_H			#endif // LLVM_FRONTEND_OPENMP_OMPCONSTANTS_H

llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h

Show First 20 Lines • Show All 349 Lines • ▼ Show 20 Lines	public:
/// the current thread, updates the relevant instructions in the canonical		/// the current thread, updates the relevant instructions in the canonical
/// loop and calls to an OpenMP runtime finalization function after the loop.		/// loop and calls to an OpenMP runtime finalization function after the loop.
///		///
/// \param Loc The source location description, the insertion location		/// \param Loc The source location description, the insertion location
/// is not used.		/// is not used.
/// \param CLI A descriptor of the canonical loop to workshare.		/// \param CLI A descriptor of the canonical loop to workshare.
/// \param AllocaIP An insertion point for Alloca instructions usable in the		/// \param AllocaIP An insertion point for Alloca instructions usable in the
/// preheader of the loop.		/// preheader of the loop.
/// \param NeedsBarrier Indicates whether a barrier must be insterted after		/// \param NeedsBarrier Indicates whether a barrier must be inserted after
/// the loop.		/// the loop.
/// \param Chunk The size of loop chunk considered as a unit when		/// \param Chunk The size of loop chunk considered as a unit when
/// scheduling. If \p nullptr, defaults to 1.		/// scheduling. If \p nullptr, defaults to 1.
///		///
/// \returns Updated CanonicalLoopInfo.		/// \returns Updated CanonicalLoopInfo.
CanonicalLoopInfo *createStaticWorkshareLoop(const LocationDescription &Loc,		CanonicalLoopInfo *createStaticWorkshareLoop(const LocationDescription &Loc,
CanonicalLoopInfo *CLI,		CanonicalLoopInfo *CLI,
InsertPointTy AllocaIP,		InsertPointTy AllocaIP,
bool NeedsBarrier,		bool NeedsBarrier,
Value *Chunk = nullptr);		Value *Chunk = nullptr);

		/// Modifies the canonical loop to be a dynamically-scheduled workshare loop.
		///
		/// This takes a \p LoopInfo representing a canonical loop, such as the one
		/// created by \p createCanonicalLoop and emits additional instructions to
		/// turn it into a workshare loop. In particular, it calls to an OpenMP
		/// runtime function in the preheader to obtain, and then in each iteration
		/// to update the loop counter.
		/// \param Loc The source location description, the insertion location
		MeinersburUnsubmitted Not Done Reply Inline Actions [suggestion] Add a newline between the description and the beginning of params. Not sure whether it makes a difference for doxygen, but clang-format likes to redo paragraph line wrapping and would include \param into the paragraph. Meinersbur: [suggestion] Add a newline between the description and the beginning of params. Not sure…
		/// is not used.
		/// \param CLI A descriptor of the canonical loop to workshare.
		/// \param AllocaIP An insertion point for Alloca instructions usable in the
		/// preheader of the loop.
		/// \param NeedsBarrier Indicates whether a barrier must be insterted after
		kiranchandramohanUnsubmitted Not Done Reply Inline Actions Nit: insterted -> inserted kiranchandramohan: Nit: insterted -> inserted
		/// the loop.
		/// \param Chunk The size of loop chunk considered as a unit when
		/// scheduling. If \p nullptr, defaults to 1.
		///
		/// \returns Point where to insert code after the loop.
		InsertPointTy createDynamicWorkshareLoop(const LocationDescription &Loc,
		CanonicalLoopInfo *CLI,
		InsertPointTy AllocaIP,
		bool NeedsBarrier,
		Value *Chunk = nullptr);

/// Modifies the canonical loop to be a workshare loop.		/// Modifies the canonical loop to be a workshare loop.
///		///
/// This takes a \p LoopInfo representing a canonical loop, such as the one		/// This takes a \p LoopInfo representing a canonical loop, such as the one
/// created by \p createCanonicalLoop and emits additional instructions to		/// created by \p createCanonicalLoop and emits additional instructions to
/// turn it into a workshare loop. In particular, it calls to an OpenMP		/// turn it into a workshare loop. In particular, it calls to an OpenMP
/// runtime function in the preheader to obtain the loop bounds to be used in		/// runtime function in the preheader to obtain the loop bounds to be used in
/// the current thread, updates the relevant instructions in the canonical		/// the current thread, updates the relevant instructions in the canonical
/// loop and calls to an OpenMP runtime finalization function after the loop.		/// loop and calls to an OpenMP runtime finalization function after the loop.
▲ Show 20 Lines • Show All 590 Lines • Show Last 20 Lines

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp

Show First 20 Lines • Show All 1,162 Lines • ▼ Show 20 Lines CanonicalLoopInfo *OpenMPIRBuilder::createStaticWorkshareLoop(

Builder.CreateStore(UpperBound, PUpperBound); Builder.CreateStore(UpperBound, PUpperBound);

Builder.CreateStore(One, PStride); Builder.CreateStore(One, PStride);

if (!Chunk) if (!Chunk)

Chunk = One; Chunk = One;

Value *ThreadNum = getOrCreateThreadID(SrcLoc); Value *ThreadNum = getOrCreateThreadID(SrcLoc);

// TODO: extract scheduling type and map it to OMP constant. This is curently Constant *SchedulingType =

// happening in kmp.h and its ilk and needs to be moved to OpenMP.td first. ConstantInt::get(I32Type, static_cast<int>(OMPScheduleType::Static));

constexpr int StaticSchedType = 34;

Constant *SchedulingType = ConstantInt::get(I32Type, StaticSchedType);

// Call the "init" function and update the trip count of the loop with the // Call the "init" function and update the trip count of the loop with the

// value it produced. // value it produced.

Builder.CreateCall(StaticInit, Builder.CreateCall(StaticInit,

{SrcLoc, ThreadNum, SchedulingType, PLastIter, PLowerBound, {SrcLoc, ThreadNum, SchedulingType, PLastIter, PLowerBound,

PUpperBound, PStride, One, Chunk}); PUpperBound, PStride, One, Chunk});

Value *LowerBound = Builder.CreateLoad(IVTy, PLowerBound); Value *LowerBound = Builder.CreateLoad(IVTy, PLowerBound);

Value *InclusiveUpperBound = Builder.CreateLoad(IVTy, PUpperBound); Value *InclusiveUpperBound = Builder.CreateLoad(IVTy, PUpperBound);

Show All 28 Lines CanonicalLoopInfo *OpenMPIRBuilder::createStaticWorkshareLoop(

CLI->assertOK(); CLI->assertOK();

return CLI; return CLI;

} }

CanonicalLoopInfo *OpenMPIRBuilder::createWorkshareLoop( CanonicalLoopInfo *OpenMPIRBuilder::createWorkshareLoop(

const LocationDescription &Loc, CanonicalLoopInfo *CLI, const LocationDescription &Loc, CanonicalLoopInfo *CLI,

InsertPointTy AllocaIP, bool NeedsBarrier) { InsertPointTy AllocaIP, bool NeedsBarrier) {

// Currently only supports static schedules. // Currently only supports static schedules.

jdoerfertUnsubmitted

Done

I think we sue doxygen comments /// for static functions as well.

jdoerfert: I think we sue doxygen comments `///` for static functions as well.

LeporacanthicusAuthorUnsubmitted

Not Done

No problem, I will update - although the other functions in this file aren't using that style (see line 1054-1057) - should I fix all of them (in a separate commit, I expect?) or just the two new functions?

Leporacanthicus: No problem, I will update - although the other functions in this file aren't using that style…

return createStaticWorkshareLoop(Loc, CLI, AllocaIP, NeedsBarrier); return createStaticWorkshareLoop(Loc, CLI, AllocaIP, NeedsBarrier);

} }

/// Returns an LLVM function to call for initializing loop bounds using OpenMP

/// dynamic scheduling depending on `type`. Only i32 and i64 are supported by

/// the runtime. Always interpret integers as unsigned similarly to

/// CanonicalLoopInfo.

static FunctionCallee

getKmpcForDynamicInitForType(Type *Ty, Module &M, OpenMPIRBuilder &OMPBuilder) {

unsigned Bitwidth = Ty->getIntegerBitWidth();

if (Bitwidth == 32)

return OMPBuilder.getOrCreateRuntimeFunction(

M, omp::RuntimeFunction::OMPRTL___kmpc_dispatch_init_4u);

if (Bitwidth == 64)

return OMPBuilder.getOrCreateRuntimeFunction(

M, omp::RuntimeFunction::OMPRTL___kmpc_dispatch_init_8u);

llvm_unreachable("unknown OpenMP loop iterator bitwidth");

}

/// Returns an LLVM function to call for updating the next loop using OpenMP

/// dynamic scheduling depending on `type`. Only i32 and i64 are supported by

/// the runtime. Always interpret integers as unsigned similarly to

/// CanonicalLoopInfo.

static FunctionCallee

getKmpcForDynamicNextForType(Type *Ty, Module &M, OpenMPIRBuilder &OMPBuilder) {

unsigned Bitwidth = Ty->getIntegerBitWidth();

if (Bitwidth == 32)

return OMPBuilder.getOrCreateRuntimeFunction(

M, omp::RuntimeFunction::OMPRTL___kmpc_dispatch_next_4u);

if (Bitwidth == 64)

return OMPBuilder.getOrCreateRuntimeFunction(

M, omp::RuntimeFunction::OMPRTL___kmpc_dispatch_next_8u);

llvm_unreachable("unknown OpenMP loop iterator bitwidth");

}

MeinersburUnsubmitted

Not Done

The InsertionPointTy of Loc is not used with methods that apply to a canonical loop. Consider passing only the DebugLoc. The same seem to apply to createStaticWorkshareLoop.

Meinersbur: The InsertionPointTy of `Loc` is not used with methods that apply to a canonical loop. Consider…

LeporacanthicusAuthorUnsubmitted

Done

It is used (but not really NECESSARY to use!) for to get the debug location in getOrCreateSrcLocStr a couple of lines down, which means more changes - I will look at doing this as a separate change, once I have something that works.

Leporacanthicus: It is used (but not really NECESSARY to use!) for to get the debug location in…

MeinersburUnsubmitted

Not Done

getOrCreateSrcLocStr uses the InsertionPoint only as backup for the function name. Since it returns a default location when DebugLoc is not available, I don't see in what circumstances it would ever be used. In any case, there should be a version of getOrCreateSrcLocStr that only needs a DebugLoc and possibly a llvm::Function which can be obtained from the CanonicalLoopInfo stored BasicBlocks.

I am OK with doing it with a separate change, I was considering myself already.

Meinersbur: getOrCreateSrcLocStr uses the InsertionPoint only as backup for the function name. Since it…

OpenMPIRBuilder::InsertPointTy OpenMPIRBuilder::createDynamicWorkshareLoop(

const LocationDescription &Loc, CanonicalLoopInfo *CLI,

InsertPointTy AllocaIP, bool NeedsBarrier, Value *Chunk) {

// Set up the source location value for OpenMP runtime.

Builder.SetCurrentDebugLocation(Loc.DL);

MeinersburUnsubmitted

Done

The InsertionPoint part is ignored anyway. It would make more sense to only pass the DebugLoc, like tileLoops.

Meinersbur: The InsertionPoint part is ignored anyway. It would make more sense to only pass the DebugLoc…

Constant *SrcLocStr = getOrCreateSrcLocStr(Loc);

Value *SrcLoc = getOrCreateIdent(SrcLocStr);

// Declare useful OpenMP runtime functions.

Value *IV = CLI->getIndVar();

Type *IVTy = IV->getType();

FunctionCallee DynamicInit = getKmpcForDynamicInitForType(IVTy, M, *this);

FunctionCallee DynamicNext = getKmpcForDynamicNextForType(IVTy, M, *this);

// Allocate space for computed loop bounds as expected by the "init" function.

Builder.restoreIP(AllocaIP);

Type *I32Type = Type::getInt32Ty(M.getContext());

Value *PLastIter = Builder.CreateAlloca(I32Type, nullptr, "p.lastiter");

Value *PLowerBound = Builder.CreateAlloca(IVTy, nullptr, "p.lowerbound");

Value *PUpperBound = Builder.CreateAlloca(IVTy, nullptr, "p.upperbound");

Value *PStride = Builder.CreateAlloca(IVTy, nullptr, "p.stride");

// At the end of the preheader, prepare for calling the "init" function by

// storing the current loop bounds into the allocated space. A canonical loop

// always iterates from 0 to trip-count with step 1. Note that "init" expects

// and produces an inclusive upper bound.

BasicBlock *PreHeader = CLI->getPreheader();

Builder.SetInsertPoint(PreHeader->getTerminator());

MeinersburUnsubmitted

Done

Since these are shared with createStaticWorkshareLoop, did you consider extractin it into a common function?

Meinersbur: Since these are shared with `createStaticWorkshareLoop`, did you consider extractin it into a…

LeporacanthicusAuthorUnsubmitted

Done

Yes, definitely on my mind. I was just concentrating on "get something that works first, then refactor" - otherwise, I find myself refactoring, and then reverting half of that, because it was the "wrong thing"... ;)

Leporacanthicus: Yes, definitely on my mind. I was just concentrating on "get something that works first, then…

MeinersburUnsubmitted

Not Done

This is fine for your personal workflow, but for committing a patch we should aim for clean source code in main. For instance, I often use auto locally but to comply to the LLVM coding standard, I have to replace most of them with the actual type.

Meinersbur: This is fine for your personal workflow, but for committing a patch we should aim for clean…

Constant *One = ConstantInt::get(IVTy, 1);

Builder.CreateStore(One, PLowerBound);

Value *UpperBound = CLI->getTripCount();

MeinersburUnsubmitted

Done

With a Chunk-size that is not one, you need at least two loops: One for all iterations of a chunk and another surrounding while-loop that ask the runtime for the next chunk.

This two-loop structure should also apply to the static schedule if the chunk is set and is greater than one. It don't see this in createStaticWorkshareLoop, it might be broken for these cases.

Meinersbur: With a Chunk-size that is not one, you need at least two loops: One for all iterations of a…

LeporacanthicusAuthorUnsubmitted

Done

It appears, from what I can tell, we ALWAYS get one in the chunk-size. I will try to fix this in a bit. I have hand-coded different values in the LLVM-IR to test that the tests I have [written in Fortran] to check that it behaves correctly. Proper tests written in either MLIR or Fortran would be needed for this in the future, and tests that check for example chunk size arrives to this section correctly.

Leporacanthicus: It appears, from what I can tell, we ALWAYS get one in the chunk-size. I will try to fix this…

MeinersburUnsubmitted

Done

Consider adding a test to OpenMPIRBuilderTest.cpp that inserts a non-one chunk size, or even a non-constant chunk-size.

Meinersbur: Consider adding a test to OpenMPIRBuilderTest.cpp that inserts a non-one chunk size, or even a…

Builder.CreateStore(UpperBound, PUpperBound);

Builder.CreateStore(One, PStride);

BasicBlock *Header = CLI->getHeader();

BasicBlock *Exit = CLI->getExit();

BasicBlock *Cond = CLI->getCond();

InsertPointTy AfterIP = CLI->getAfterIP();

// The CLI will be "broken" in the code below, as the loop is no longer

// a valid canonical loop.

if (!Chunk)

MeinersburUnsubmitted

Done

For schedule(dynamic), the default chunk size is indeed one.

Meinersbur: For `schedule(dynamic)`, the default chunk size is indeed one.

Chunk = One;

MeinersburUnsubmitted

Done

Builder.CreateCall(DynamicInit,

- {SrcLoc, ThreadNum, SchedulingType, Zero /* LastIter */,

- One /* LowerBound */, UpperBound, One});

+ {SrcLoc, ThreadNum, SchedulingType, /*LastIter=*/ Zero,

+ /*LowerBound=*/ One, UpperBound, One});

Value *LowerBound = Builder.CreateLoad(PLowerBound);

Usually the name of the parameter comes before the argument

Meinersbur: Usually the name of the parameter comes before the argument

Value *ThreadNum = getOrCreateThreadID(SrcLoc);

OMPScheduleType DynamicSchedType =

MeinersburUnsubmitted

Done

The could just be added to OMPConstants.h, not necessary to involve tablegen.

kmp_sched in kmp.h is already redundant with omp_sched from omp.h. To not introduce dependency issues atm, I suggest to just reproduce it them OMPConstants.h with a comment to remarks they have to keep the in sync.

However, I found that libomptarget/plugins/amdgpu/src/rtl.cpp already includes from libLLVMFrontend.

Meinersbur: The could just be added to `OMPConstants.h`, not necessary to involve tablegen. `kmp_sched` in…

LeporacanthicusAuthorUnsubmitted

Done

I have moved a minimal set of constants.

Leporacanthicus: I have moved a minimal set of constants.

OMPScheduleType::DynamicChunked | OMPScheduleType::ModifierNonmonotonic;

Constant *SchedulingType =

MeinersburUnsubmitted

Done

With dynamic schedule, the trip count is not known it advance.

Meinersbur: With dynamic schedule, the trip count is not known it advance.

ConstantInt::get(I32Type, static_cast<int>(DynamicSchedType));

// Call the "init" function.

Builder.CreateCall(DynamicInit,

{SrcLoc, ThreadNum, SchedulingType, /* LowerBound */ One,

UpperBound, /* step */ One, Chunk});

// An outer loop around the existing one.

BasicBlock *OuterCond = BasicBlock::Create(

PreHeader->getContext(), Twine(PreHeader->getName()) + ".outer.cond",

MeinersburUnsubmitted

Done

BasicBlock *OuterCond = BasicBlock::Create(

- PreHeader->getContext(), PreHeader->getName() + ".outer.cond",

+ PreHeader->getContext(), Twine(PreHeader->getName()) + ".outer.cond",

PreHeader->getParent());

To avoid creating temporary strings.

Meinersbur: To avoid creating temporary strings.

PreHeader->getParent());

// This needs to be 32-bit always, so can't use the IVTy Zero above.

Builder.SetInsertPoint(OuterCond, OuterCond->getFirstInsertionPt());

Value *Res =

Builder.CreateCall(DynamicNext, {SrcLoc, ThreadNum, PLastIter,

PLowerBound, PUpperBound, PStride});

Constant *Zero32 = ConstantInt::get(I32Type, 0);

Value *MoreWork = Builder.CreateCmp(CmpInst::ICMP_NE, Res, Zero32);

Value *LowerBound =

Builder.CreateSub(Builder.CreateLoad(IVTy, PLowerBound), One, "lb");

Builder.CreateCondBr(MoreWork, Header, Exit);

// Change PHI-node in loop header to use outer cond rather than preheader,

// and set IV to the LowerBound.

Instruction *Phi = &Header->front();

auto *PI = cast<PHINode>(Phi);

MeinersburUnsubmitted

Done

[style] LLVM's coding style does not use "Almost Always Auto"

Consider using getHeder()->front()

Meinersbur: [style] [[ https://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more…

PI->setIncomingBlock(0, OuterCond);

PI->setIncomingValue(0, LowerBound);

// Then set the pre-header to jump to the OuterCond

Instruction *Term = PreHeader->getTerminator();

MeinersburUnsubmitted

Done

auto *Phi = &*CLI->getHeader()->begin();

- if (auto *PI = dyn_cast<PHINode>(Phi)) {

- PI->setIncomingBlock(0, OuterCond);

- PI->setIncomingValue(0, LowerBound);

- } else

- llvm_unreachable("Expected this to be a phi-node");

+ auto *PI = cast<PHINode>(Phi)

+ PI->setIncomingBlock(0, OuterCond);

+ PI->setIncomingValue(0, LowerBound);

// Then set the pre-header to jump to the OuterCond

cast already contains an assertion if the argument is not the casted-to type.

Meinersbur: `cast` already contains an assertion if the argument is not the casted-to type.

auto *Br = cast<BranchInst>(Term);

jdoerfertUnsubmitted

Done

I don't know what this does but it looks brittle. We should have handles for all values, e.g., "the old condition", which we can remove is necessary by following the handle and the operands (with a single use).

jdoerfert: I don't know what this does but it looks brittle. We should have handles for all values, e.g.

LeporacanthicusAuthorUnsubmitted

Done

Agreed. There needs to be a better way to do this... ;)

What it "does" is: It removes the "instructions after the ones I added" . What I would like to do is actually replace the existing cond with a new one[1] (and assuming nothing else uses that original block, remove it). There's no good way to do that with the current interface on CanonicalLoopInfo, so I did this to at least get me to a BasicBlock that doesn't get rejected for having two different terminations (is that the right term?).

[1] That's what I think I want to do. My understanding, and this is mainly based on what clang++ generates for "the same" C++ code to the Fortran code I'm experimenting with here, is that my cond BB should contain a call to the kmpc_dispatch_next and check the return value from that, and either exit or keep going - I may be completely wrong in this understanding. The original code, that I copied from the static variant is doing a compare to the full loop count, but in dynamic the work is not necessarily finished in order, happy to be corrected on these things, I'm new to both Fortran and OpenMP,

Leporacanthicus: Agreed. There needs to be a better way to do this... ;) What it "does" is: It removes the…

Br->setSuccessor(0, OuterCond);

// Modify the inner condition:

// * Use the UpperBound returned from the DynamicNext call.

// * jump to the loop outer loop when done with one of the inner loops.

MeinersburUnsubmitted

Done

Typo?

Meinersbur: Typo?

MeinersburUnsubmitted

Done

"Rejig" is not a typo? Wiktionary knows it, but UK-only.

Meinersbur: "Rejig" is not a typo? [[ https://en.wiktionary.org/wiki/rejig | Wiktionary ]] knows it, but…

LeporacanthicusAuthorUnsubmitted

Done

Changing to "modify" to be more universal. Living in the UK, I'm not always aware of what English words are "UK" only and which are "any English variety".

Leporacanthicus: Changing to "modify" to be more universal. Living in the UK, I'm not always aware of what…

Builder.SetInsertPoint(Cond, Cond->getFirstInsertionPt());

UpperBound = Builder.CreateLoad(IVTy, PUpperBound, "ub");

Instruction *Comp = &*Builder.GetInsertPoint();

auto *CI = cast<CmpInst>(Comp);

CI->setOperand(1, UpperBound);

// Redirect the inner exit to branch to outer condition.

MeinersburUnsubmitted

Done

Comment is outdated? No CanonicalLoopInfo needs to be preserved.

Meinersbur: Comment is outdated? No CanonicalLoopInfo needs to be preserved.

LeporacanthicusAuthorUnsubmitted

Done

Yes, and since we're adding zero to IV, it's not much point in updating it, so I removed the whole block below too - as far as I can tell, it produces the same result.

Leporacanthicus: Yes, and since we're adding zero to IV, it's not much point in updating it, so I removed the…

Instruction *Branch = &Cond->back();

auto *BI = cast<BranchInst>(Branch);

assert(BI->getSuccessor(1) == Exit);

BI->setSuccessor(1, OuterCond);

// Add the barrier if requested.

if (NeedsBarrier) {

Builder.SetInsertPoint(&Exit->back());

createBarrier(LocationDescription(Builder.saveIP(), Loc.DL),

omp::Directive::OMPD_for, /* ForceSimpleCall */ false,

/* CheckCancelFlag */ false);

}

return AfterIP;

}

/// Make \p Source branch to \p Target. /// Make \p Source branch to \p Target.

/// ///

/// Handles two situations: /// Handles two situations:

/// * \p Source already has an unconditional branch. /// * \p Source already has an unconditional branch.

/// * \p Source is a degenerate block (no terminator because the BB is /// * \p Source is a degenerate block (no terminator because the BB is

/// the current head of the IR construction). /// the current head of the IR construction).

static void redirectTo(BasicBlock *Source, BasicBlock *Target, DebugLoc DL) { static void redirectTo(BasicBlock *Source, BasicBlock *Target, DebugLoc DL) {

if (Instruction *Term = Source->getTerminator()) { if (Instruction *Term = Source->getTerminator()) {

auto *Br = cast<BranchInst>(Term); auto *Br = cast<BranchInst>(Term);

assert(!Br->isConditional() && assert(!Br->isConditional() &&

"BB's terminator must be an unconditional branch (or degenerate)"); "BB's terminator must be an unconditional branch (or degenerate)");

BasicBlock *Succ = Br->getSuccessor(0); BasicBlock *Succ = Br->getSuccessor(0);

Succ->removePredecessor(Source, /*KeepOneInputPHIs=*/true); Succ->removePredecessor(Source, /*KeepOneInputPHIs=*/true);

Br->setSuccessor(0, Target); Br->setSuccessor(0, Target);

return; return;

} }

auto *NewBr = BranchInst::Create(Target, Source); auto *NewBr = BranchInst::Create(Target, Source);

NewBr->setDebugLoc(DL); NewBr->setDebugLoc(DL);

} }

/// Redirect all edges that branch to \p OldTarget to \p NewTarget. That is, /// Redirect all edges that branch to \p OldTarget to \p NewTarget. That is,

MeinersburUnsubmitted

Done

Consider extracting all the info you need from CanonicalLoopInfo at the beginning and then abandoning the structure, since starting from the first CFG modification, it does not describe a canonical loop anymore but methods such as getAfterIP() may assume so.

Meinersbur: Consider extracting all the info you need from CanonicalLoopInfo at the beginning and then…

/// after this \p OldTarget will be orphaned. /// after this \p OldTarget will be orphaned.

static void redirectAllPredecessorsTo(BasicBlock *OldTarget, static void redirectAllPredecessorsTo(BasicBlock *OldTarget,

BasicBlock *NewTarget, DebugLoc DL) { BasicBlock *NewTarget, DebugLoc DL) {

for (BasicBlock *Pred : make_early_inc_range(predecessors(OldTarget))) for (BasicBlock *Pred : make_early_inc_range(predecessors(OldTarget)))

redirectTo(Pred, NewTarget, DL); redirectTo(Pred, NewTarget, DL);

} }

/// Determine which blocks in \p BBs are reachable from outside and remove the /// Determine which blocks in \p BBs are reachable from outside and remove the

▲ Show 20 Lines • Show All 643 Lines • ▼ Show 20 Lines CallInst *OpenMPIRBuilder::createCachedThreadPrivate(

Constant *SrcLocStr = getOrCreateSrcLocStr(Loc); Constant *SrcLocStr = getOrCreateSrcLocStr(Loc);

Value *Ident = getOrCreateIdent(SrcLocStr); Value *Ident = getOrCreateIdent(SrcLocStr);

Value *ThreadId = getOrCreateThreadID(Ident); Value *ThreadId = getOrCreateThreadID(Ident);

Constant *ThreadPrivateCache = Constant *ThreadPrivateCache =

getOrCreateOMPInternalVariable(Int8PtrPtr, Name); getOrCreateOMPInternalVariable(Int8PtrPtr, Name);

llvm::Value *Args[] = {Ident, ThreadId, Pointer, Size, ThreadPrivateCache}; llvm::Value *Args[] = {Ident, ThreadId, Pointer, Size, ThreadPrivateCache};

Function *Fn = Function *Fn =

getOrCreateRuntimeFunctionPtr(OMPRTL___kmpc_threadprivate_cached); getOrCreateRuntimeFunctionPtr(OMPRTL___kmpc_threadprivate_cached);

return Builder.CreateCall(Fn, Args); return Builder.CreateCall(Fn, Args);

} }

std::string OpenMPIRBuilder::getNameWithSeparators(ArrayRef<StringRef> Parts, std::string OpenMPIRBuilder::getNameWithSeparators(ArrayRef<StringRef> Parts,

StringRef FirstSeparator, StringRef FirstSeparator,

StringRef Separator) { StringRef Separator) {

SmallString<128> Buffer; SmallString<128> Buffer;

▲ Show 20 Lines • Show All 181 Lines • Show Last 20 Lines

llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp

Show First 20 Lines • Show All 1,702 Lines • ▼ Show 20 Lines	TEST_F(OpenMPIRBuilderTest, StaticWorkShareLoop) {
// The exit block should contain the "fini" call and the barrier call,		// The exit block should contain the "fini" call and the barrier call,
// plus the call to obtain the thread ID.		// plus the call to obtain the thread ID.
BasicBlock *ExitBlock = CLI->getExit();		BasicBlock *ExitBlock = CLI->getExit();
size_t NumCallsInExitBlock =		size_t NumCallsInExitBlock =
count_if(*ExitBlock, [](Instruction &I) { return isa<CallInst>(I); });		count_if(*ExitBlock, [](Instruction &I) { return isa<CallInst>(I); });
EXPECT_EQ(NumCallsInExitBlock, 3u);		EXPECT_EQ(NumCallsInExitBlock, 3u);
}		}

		TEST_F(OpenMPIRBuilderTest, DynamicWorkShareLoop) {
		using InsertPointTy = OpenMPIRBuilder::InsertPointTy;
		OpenMPIRBuilder OMPBuilder(*M);
		OMPBuilder.initialize();
		IRBuilder<> Builder(BB);
		OpenMPIRBuilder::LocationDescription Loc({Builder.saveIP(), DL});

		Type *LCTy = Type::getInt32Ty(Ctx);
		Value *StartVal = ConstantInt::get(LCTy, 10);
		Value *StopVal = ConstantInt::get(LCTy, 52);
		Value *StepVal = ConstantInt::get(LCTy, 2);
		Value *ChunkVal = ConstantInt::get(LCTy, 7);
		auto LoopBodyGen = [&](InsertPointTy, llvm::Value *) {};

		CanonicalLoopInfo *CLI = OMPBuilder.createCanonicalLoop(
		Loc, LoopBodyGen, StartVal, StopVal, StepVal,
		/IsSigned=/false, /InclusiveStop=/false);

		Builder.SetInsertPoint(BB, BB->getFirstInsertionPt());
		InsertPointTy AllocaIP = Builder.saveIP();

		// Collect all the info from CLI, as it isn't usable after the call to
		// createDynamicWorkshareLoop.
		InsertPointTy AfterIP = CLI->getAfterIP();
		MeinersburUnsubmitted Done Reply Inline Actions I assumed you would add a second test when chunk is non-default. However, the default chunk size of 1 is not special, so I am find with this. Meinersbur: I assumed you would add a second test when chunk is non-default. However, the default chunk…
		LeporacanthicusAuthorUnsubmitted Done Reply Inline Actions I am in the process of adding fortran-compiler tests that do dynamic with no chunk size and static with a given chunk size, to cover more - but won't go into llvm main for a while, I guess. Leporacanthicus: I am in the process of adding fortran-compiler tests that do dynamic with no chunk size and…
		BasicBlock *Preheader = CLI->getPreheader();
		BasicBlock *ExitBlock = CLI->getExit();
		Value *IV = CLI->getIndVar();
		MeinersburUnsubmitted Done Reply Inline Actions Do not use `CLI` after it has been invalidated. Meinersbur: Do not use `CLI` after it has been invalidated.

		InsertPointTy EndIP =
		OMPBuilder.createDynamicWorkshareLoop(Loc, CLI, AllocaIP,
		/NeedsBarrier=/true, ChunkVal);
		// The returned value should be the "after" point.
		ASSERT_EQ(EndIP.getBlock(), AfterIP.getBlock());
		ASSERT_EQ(EndIP.getPoint(), AfterIP.getPoint());
		MeinersburUnsubmitted Done Reply Inline Actions [nit] Can use `auto ` when using `dyn_cast` on the same line. At least, please use consistent style in the same function, Meinersbur:* [nit] Can use `auto *` when using `dyn_cast` on the same line. At least, please use consistent…
		LeporacanthicusAuthorUnsubmitted Done Reply Inline Actions The whole file is inconsistent in this respect. First dyn_cast in the file is to a named type (line 156), the second one is to an auto (line 161), but then it is MOSTLY named types, with a scattering of auto in the original static code. I'm changing the ones below to use non-auto, to match the rest of the file. Leporacanthicus: The whole file is inconsistent in this respect. First dyn_cast in the file is to a named type…
		MeinersburUnsubmitted Not Done Reply Inline Actions Reviews of previous patches were not necessirily perfect ;-( Meinersbur: Reviews of previous patches were not necessirily perfect ;-(
		LeporacanthicusAuthorUnsubmitted Done Reply Inline Actions Indeed. Problem comes when you copy-paste code from previously "not perfect reviews". I've been on the other end of this too. ;) Leporacanthicus: Indeed. Problem comes when you copy-paste code from previously "not perfect reviews". I've been…

		auto AllocaIter = BB->begin();
		ASSERT_GE(std::distance(BB->begin(), BB->end()), 4);
		AllocaInst PLastIter = dyn_cast<AllocaInst>(&(AllocaIter++));
		AllocaInst PLowerBound = dyn_cast<AllocaInst>(&(AllocaIter++));
		AllocaInst PUpperBound = dyn_cast<AllocaInst>(&(AllocaIter++));
		AllocaInst PStride = dyn_cast<AllocaInst>(&(AllocaIter++));
		EXPECT_NE(PLastIter, nullptr);
		EXPECT_NE(PLowerBound, nullptr);
		EXPECT_NE(PUpperBound, nullptr);
		EXPECT_NE(PStride, nullptr);

		auto PreheaderIter = Preheader->begin();
		ASSERT_GE(std::distance(Preheader->begin(), Preheader->end()), 6);
		StoreInst LowerBoundStore = dyn_cast<StoreInst>(&(PreheaderIter++));
		StoreInst UpperBoundStore = dyn_cast<StoreInst>(&(PreheaderIter++));
		StoreInst StrideStore = dyn_cast<StoreInst>(&(PreheaderIter++));
		ASSERT_NE(LowerBoundStore, nullptr);
		ASSERT_NE(UpperBoundStore, nullptr);
		ASSERT_NE(StrideStore, nullptr);

		CallInst ThreadIdCall = dyn_cast<CallInst>(&(PreheaderIter++));
		ASSERT_NE(ThreadIdCall, nullptr);
		EXPECT_EQ(ThreadIdCall->getCalledFunction()->getName(),
		"__kmpc_global_thread_num");

		CallInst InitCall = dyn_cast<CallInst>(&PreheaderIter);

		ASSERT_NE(InitCall, nullptr);
		EXPECT_EQ(InitCall->getCalledFunction()->getName(),
		"__kmpc_dispatch_init_4u");
		EXPECT_EQ(InitCall->getNumArgOperands(), 7U);
		EXPECT_EQ(InitCall->getArgOperand(6),
		ConstantInt::get(Type::getInt32Ty(Ctx), 7));

		ConstantInt *OrigLowerBound =
		dyn_cast<ConstantInt>(LowerBoundStore->getValueOperand());
		ConstantInt *OrigUpperBound =
		MeinersburUnsubmitted Not Done Reply Inline Actions You could just use `cast` instead of `dyn_cast` which asserts if it is the wrong type, so you do not need explicitly check for nullptr. Meinersbur: You could just use `cast` instead of `dyn_cast` which asserts if it is the wrong type, so you…
		LeporacanthicusAuthorUnsubmitted Done Reply Inline Actions Again, this matches the rest of the file - there is a small number of `cast<X>(y)`, but they are all preceded by an `ASSERT_TRUE(isa<X>(y))`, which is about the same as this construct - the `isa<X>` check may explain more clearly what's wrong. but I doubt anyone working on LLVM for more than a few days will struggle to understand why a `dyn_cast` returned a `nullptr`. I think the assert you get inside the `cast<X>` is less clear - if nothing else, in the sense that you probably can't set a breakpoint and then inspect what happened in a good way. Leporacanthicus: Again, this matches the rest of the file - there is a small number of `cast<X>(y)`, but they…
		dyn_cast<ConstantInt>(UpperBoundStore->getValueOperand());
		ConstantInt *OrigStride =
		dyn_cast<ConstantInt>(StrideStore->getValueOperand());
		ASSERT_NE(OrigLowerBound, nullptr);
		ASSERT_NE(OrigUpperBound, nullptr);
		ASSERT_NE(OrigStride, nullptr);
		EXPECT_EQ(OrigLowerBound->getValue(), 1);
		EXPECT_EQ(OrigUpperBound->getValue(), 21);
		EXPECT_EQ(OrigStride->getValue(), 1);

		// The original loop iterator should only be used in the condition, in the
		// increment and in the statement that adds the lower bound to it.
		EXPECT_EQ(std::distance(IV->use_begin(), IV->use_end()), 3);

		// The exit block should contain the barrier call, plus the call to obtain
		// the thread ID.
		size_t NumCallsInExitBlock =
		count_if(*ExitBlock, [](Instruction &I) { return isa<CallInst>(I); });
		EXPECT_EQ(NumCallsInExitBlock, 2u);

		// Add a termination to our block and check that it is internally consistent.
		Builder.restoreIP(EndIP);
		Builder.CreateRetVoid();
		OMPBuilder.finalize();
		EXPECT_FALSE(verifyModule(*M, &errs()));
		}

TEST_F(OpenMPIRBuilderTest, MasterDirective) {		TEST_F(OpenMPIRBuilderTest, MasterDirective) {
using InsertPointTy = OpenMPIRBuilder::InsertPointTy;		using InsertPointTy = OpenMPIRBuilder::InsertPointTy;
OpenMPIRBuilder OMPBuilder(*M);		OpenMPIRBuilder OMPBuilder(*M);
OMPBuilder.initialize();		OMPBuilder.initialize();
F->setName("func");		F->setName("func");
IRBuilder<> Builder(BB);		IRBuilder<> Builder(BB);

OpenMPIRBuilder::LocationDescription Loc({Builder.saveIP(), DL});		OpenMPIRBuilder::LocationDescription Loc({Builder.saveIP(), DL});
▲ Show 20 Lines • Show All 351 Lines • Show Last 20 Lines

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp

Show First 20 Lines • Show All 173 Lines • ▼ Show 20 Lines	convertOmpWsLoop(Operation &opInst, llvm::IRBuilderBase &builder,
auto loop = cast<omp::WsLoopOp>(opInst);		auto loop = cast<omp::WsLoopOp>(opInst);
// TODO: this should be in the op verifier instead.		// TODO: this should be in the op verifier instead.
if (loop.lowerBound().empty())		if (loop.lowerBound().empty())
return failure();		return failure();

if (loop.getNumLoops() != 1)		if (loop.getNumLoops() != 1)
return opInst.emitOpError("collapsed loops not yet supported");		return opInst.emitOpError("collapsed loops not yet supported");

if (loop.schedule_val().hasValue() &&		bool isStatic = true;
omp::symbolizeClauseScheduleKind(loop.schedule_val().getValue()) !=
omp::ClauseScheduleKind::Static)		if (loop.schedule_val().hasValue()) {
return opInst.emitOpError(		auto schedule =
"only static (default) loop schedule is currently supported");		omp::symbolizeClauseScheduleKind(loop.schedule_val().getValue());
		if (schedule != omp::ClauseScheduleKind::Static &&
		schedule != omp::ClauseScheduleKind::Dynamic)
		return opInst.emitOpError("only static (default) and dynamic loop "
		"schedule is currently supported");
		isStatic = (schedule == omp::ClauseScheduleKind::Static);
		}

// Find the loop configuration.		// Find the loop configuration.
llvm::Value *lowerBound = moduleTranslation.lookupValue(loop.lowerBound()[0]);		llvm::Value *lowerBound = moduleTranslation.lookupValue(loop.lowerBound()[0]);
llvm::Value *upperBound = moduleTranslation.lookupValue(loop.upperBound()[0]);		llvm::Value *upperBound = moduleTranslation.lookupValue(loop.upperBound()[0]);
llvm::Value *step = moduleTranslation.lookupValue(loop.step()[0]);		llvm::Value *step = moduleTranslation.lookupValue(loop.step()[0]);
llvm::Type *ivType = step->getType();		llvm::Type *ivType = step->getType();
llvm::Value *chunk =		llvm::Value *chunk =
loop.schedule_chunk_var()		loop.schedule_chunk_var()
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	if (failed(bodyGenStatus))
return failure();		return failure();

// TODO: get the alloca insertion point from the parallel operation builder.		// TODO: get the alloca insertion point from the parallel operation builder.
// If we insert the at the top of the current function, they will be passed as		// If we insert the at the top of the current function, they will be passed as
// extra arguments into the function the parallel operation builder outlines.		// extra arguments into the function the parallel operation builder outlines.
// Put them at the start of the current block for now.		// Put them at the start of the current block for now.
llvm::OpenMPIRBuilder::InsertPointTy allocaIP(		llvm::OpenMPIRBuilder::InsertPointTy allocaIP(
insertBlock, insertBlock->getFirstInsertionPt());		insertBlock, insertBlock->getFirstInsertionPt());
loopInfo = moduleTranslation.getOpenMPBuilder()->createStaticWorkshareLoop(		llvm::OpenMPIRBuilder::InsertPointTy afterIP;
ompLoc, loopInfo, allocaIP, !loop.nowait(), chunk);		llvm::OpenMPIRBuilder *ompBuilder = moduleTranslation.getOpenMPBuilder();
		if (isStatic) {
		loopInfo = ompBuilder->createStaticWorkshareLoop(ompLoc, loopInfo, allocaIP,
		!loop.nowait(), chunk);
		afterIP = loopInfo->getAfterIP();
		} else {
		afterIP = ompBuilder->createDynamicWorkshareLoop(ompLoc, loopInfo, allocaIP,
		!loop.nowait(), chunk);
		}

// Continue building IR after the loop.		// Continue building IR after the loop.
builder.restoreIP(loopInfo->getAfterIP());		builder.restoreIP(afterIP);
return success();		return success();
}		}

namespace {		namespace {

/// Implementation of the dialect interface that converts operations belonging		/// Implementation of the dialect interface that converts operations belonging
/// to the OpenMP dialect to LLVM IR.		/// to the OpenMP dialect to LLVM IR.
class OpenMPDialectLLVMIRTranslationInterface		class OpenMPDialectLLVMIRTranslationInterface
▲ Show 20 Lines • Show All 79 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP IRBuilder, MLIR] Add support for OpenMP do schedule dynamicClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 338123

llvm/include/llvm/Frontend/OpenMP/OMPConstants.h

llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp

llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp

[OpenMP IRBuilder, MLIR] Add support for OpenMP do schedule dynamic
ClosedPublic