The implementation supports static schedule for Fortran do loops. This
implements the dynamic variant of the same concept.
Please note that this is not intended as the final commit, but rather a basis for asking some advice on how to move forward.
The main stumbling point, which may be my lack of understanding of what it's supposed to do: the CanonicalLoopInfo assumes that the the cond block has a CmpInst as the first instruction. In the dynamic, the corresponding block [in my understanding] starts with a call instruction to fetch the "next" set of data to process. This causes the assertOK to fail, hence it is commented out on line 1300 in the patch.
I think we sue doxygen comments /// for static functions as well.
I don't know what this does but it looks brittle. We should have handles for all values, e.g., "the old condition", which we can remove is necessary by following the handle and the operands (with a single use).
The InsertionPointTy of Loc is not used with methods that apply to a canonical loop. Consider passing only the DebugLoc. The same seem to apply to createStaticWorkshareLoop.
Since these are shared with createStaticWorkshareLoop, did you consider extractin it into a common function?
Usually the name of the parameter comes before the argument
No problem, I will update - although the other functions in this file aren't using that style (see line 1054-1057) - should I fix all of them (in a separate commit, I expect?) or just the two new functions?
Agreed. There needs to be a better way to do this... ;)
What it "does" is: It removes the "instructions after the ones I added" . What I would like to do is actually replace the existing cond with a new one (and assuming nothing else uses that original block, remove it). There's no good way to do that with the current interface on CanonicalLoopInfo, so I did this to at least get me to a BasicBlock that doesn't get rejected for having two different terminations (is that the right term?).
 That's what I think I want to do. My understanding, and this is mainly based on what clang++ generates for "the same" C++ code to the Fortran code I'm experimenting with here, is that my cond BB should contain a call to the kmpc_dispatch_next and check the return value from that, and either exit or keep going - I may be completely wrong in this understanding. The original code, that I copied from the static variant is doing a compare to the full loop count, but in dynamic the work is not necessarily finished in order, happy to be corrected on these things, I'm new to both Fortran and OpenMP,
After applying worksharing-loop, the result is not a canonical loop nest anymore (one reason that the number of iterations is not known in advance). Hence, the CanonicalLoopInfo structure does not need to be preserved[*].
(*) I am working on making the chunk-loop accessible for loop transformations in OpenMP 6.0, which would require representing it as a CanonicalLoopInfo.
With a Chunk-size that is not one, you need at least two loops: One for all iterations of a chunk and another surrounding while-loop that ask the runtime for the next chunk.
This two-loop structure should also apply to the static schedule if the chunk is set and is greater than one. It don't see this in createStaticWorkshareLoop, it might be broken for these cases.
With dynamic schedule, the trip count is not known it advance.
The InsertionPoint part is ignored anyway. It would make more sense to only pass the DebugLoc, like tileLoops.
For schedule(dynamic), the default chunk size is indeed one.
The could just be added to OMPConstants.h, not necessary to involve tablegen.
kmp_sched in kmp.h is already redundant with omp_sched from omp.h. To not introduce dependency issues atm, I suggest to just reproduce it them OMPConstants.h with a comment to remarks they have to keep the in sync.
However, I found that libomptarget/plugins/amdgpu/src/rtl.cpp already includes from libLLVMFrontend.
Consider using getHeder()->front()
cast already contains an assertion if the argument is not the casted-to type.
Comment is outdated? No CanonicalLoopInfo needs to be preserved.
Consider extracting all the info you need from CanonicalLoopInfo at the beginning and then abandoning the structure, since starting from the first CFG modification, it does not describe a canonical loop anymore but methods such as getAfterIP() may assume so.
This is no longer a "not intented as final commit". I'm not saying it's perfect, but review comments would be useful.
It is used (but not really NECESSARY to use!) for to get the debug location in getOrCreateSrcLocStr a couple of lines down, which means more changes - I will look at doing this as a separate change, once I have something that works.
Yes, definitely on my mind. I was just concentrating on "get something that works first, then refactor" - otherwise, I find myself refactoring, and then reverting half of that, because it was the "wrong thing"... ;)
Just a quick comment. I will review in detail later.
This patch connects MLIR lowering to LLVM IR using the OpenMPIRBuilder. We should add a test for that flow or remove that connection in this patch.
Once the pretty-printer and parser patch (https://reviews.llvm.org/D92327) lands, it will become easier to write tests.
getOrCreateSrcLocStr uses the InsertionPoint only as backup for the function name. Since it returns a default location when DebugLoc is not available, I don't see in what circumstances it would ever be used. In any case, there should be a version of getOrCreateSrcLocStr that only needs a DebugLoc and possibly a llvm::Function which can be obtained from the CanonicalLoopInfo stored BasicBlocks.
I am OK with doing it with a separate change, I was considering myself already.
This is fine for your personal workflow, but for committing a patch we should aim for clean source code in main. For instance, I often use auto locally but to comply to the LLVM coding standard, I have to replace most of them with the actual type.
I'm going to upload the fixes once they compile, but it's getting late, so probably not until tomorrow morning.
It appears, from what I can tell, we ALWAYS get one in the chunk-size. I will try to fix this in a bit. I have hand-coded different values in the LLVM-IR to test that the tests I have [written in Fortran] to check that it behaves correctly. Proper tests written in either MLIR or Fortran would be needed for this in the future, and tests that check for example chunk size arrives to this section correctly.
I have moved a minimal set of constants.
Yes, and since we're adding zero to IV, it's not much point in updating it, so I removed the whole block below too - as far as I can tell, it produces the same result.
IMHO we should copy&paste the kmp_schedule enum here entirely, with a comment that it must be kept in sync.
Eventually kmp.h should include OMPConstants.h and use the declaration here instead of declaring its own.
Consider adding a test to OpenMPIRBuilderTest.cpp that inserts a non-one chunk size, or even a non-constant chunk-size.
I have added a comment to say that this needs to match kmp.h
I did look at copying the whole set of enum values, but I think that's better done at a later stage - I don't feel I understand exactly how these things are being used, and that would be key to how they get organized in a move. [Yes, could just copy the whole thing, but I'm not convinced that is the BEST choice - there appears to be a secondary hierarchy in there].
I will revisit this in a future patch, after I have spent some time understanding all the uses of these enum values.
[suggestion] Add a newline between the description and the beginning of params. Not sure whether it makes a difference for doxygen, but clang-format likes to redo paragraph line wrapping and would include \param into the paragraph.
To avoid creating temporary strings.
"Rejig" is not a typo? Wiktionary knows it, but UK-only.
I assumed you would add a second test when chunk is non-default. However, the default chunk size of 1 is not special, so I am find with this.
[nit] Can use auto * when using dyn_cast on the same line. At least, please use consistent style in the same function,
You could just use cast instead of dyn_cast which asserts if it is the wrong type, so you do not need explicitly check for nullptr.
Changing to "modify" to be more universal. Living in the UK, I'm not always aware of what English words are "UK" only and which are "any English variety".
I am in the process of adding fortran-compiler tests that do dynamic with no chunk size and static with a given chunk size, to cover more - but won't go into llvm main for a while, I guess.
The whole file is inconsistent in this respect. First dyn_cast in the file is to a named type (line 156), the second one is to an auto (line 161), but then it is MOSTLY named types, with a scattering of auto in the original static code. I'm changing the ones below to use non-auto, to match the rest of the file.
Again, this matches the rest of the file - there is a small number of cast<X>(y), but they are all preceded by an ASSERT_TRUE(isa<X>(y)), which is about the same as this construct - the isa<X> check may explain more clearly what's wrong. but I doubt anyone working on LLVM for more than a few days will struggle to understand why a dyn_cast returned a nullptr. I think the assert you get inside the cast<X> is less clear - if nothing else, in the sense that you probably can't set a breakpoint and then inspect what happened in a good way.
Could you add the following to the unittest?
Builder.restoreIP(EndIP); Builder.CreateRetVoid(); OMPBuilder.finalize(); EXPECT_FALSE(verifyModule(*M, &errs()));
I checks whether the IR is internally consistent.
Do not use CLI after it has been invalidated.
Reviews of previous patches were not necessirily perfect ;-(