After invoking kmpc_for_static_init_4, values in omp.ub and omp.lb are bound by the bounds of the sequential loop. This information is passed to the IR by setting the !range metadata of the corresponding variables.
Details
Diff Detail
Event Timeline
This looks pretty good already. I inlined some comments. We also need negative tests, e.g., with non-constant loop bounds.
llvm/lib/Transforms/IPO/OpenMPOpt.cpp | ||
---|---|---|
46 | We should add a statistics tracker for annotated loops. | |
105 | The documentation here needs improvement. If the callback returns true the use will be deleted. Could you augment the documentation and make sure you don't cause uses to be removed from the cache. | |
218 | We still use capital first letters for characters, thus SetRangeCB and StaticForCall, ... | |
221 | We need names for these "magic constants" maybe some globals like this: static constexpr unsigned KMPC_FOR_STATIC_INIT_LB = 4; static constexpr unsigned KMPC_FOR_STATIC_INIT_UB = 5; | |
224 | Nit: I somehow would prefer nullptr instead of NULL. | |
249 | You need to verify there is only a single load/store and the value is an alloca without initializer and the type is as expected. | |
278 | Style: Early exists are easier to read, so: if (cond) { DEBUG(...); return; } | |
llvm/test/Transforms/OpenMP/set_bound_ranges.ll | ||
100 | I think you can remove the attributes and metadata above. |
Post review amends:
- Doesn't delete the use after adding the range
- Added a negative test
- Satisfies Linter
llvm/lib/Transforms/IPO/OpenMPOpt.cpp | ||
---|---|---|
249 |
There might be multiple loads/store to %omp.ub arising from EnsureUpperBound. So, we only pick the store just before and the load just after the static_for invocation.
Is the "value" begin discussed here V? |
Seems like you uploaded a diff against your previous patch version. Also changes from OMPKinds.def are missing. Can you fix that?
llvm/lib/Transforms/IPO/OpenMPOpt.cpp | ||
---|---|---|
222 | for (const Use &U : BoundVal->uses()) seems nicer. |
llvm/lib/Transforms/IPO/OpenMPOpt.cpp | ||
---|---|---|
47 | -for | |
206 | no else after return | |
211 | Either move these dyn_cast prior to the conditional and branch on Low and HighMinusOne (which are nullptr if they are not constant ints), or use cast here because you had a isa before. I would probably do the former. | |
219 | We should check that this is better than the existing metadata. | |
221 | move them into the class or global scope. | |
222 | Right. And if you only want the users: users(). | |
249 | If there are multiple, how do you ensure you pick up the right one? We probably need to find all and use the upper/lower bound of all stored values to be correct. We also need a test for multiple loads/stores. I mean we assume OMPLBVal and OMPUBVal are alloca's, right? We need to make sure we only pattern match a situation we can "control" |
Seems like you uploaded a diff against your previous patch version
Thanks @sstefan1, indeed I confused updated diff as being the iterative update. Should be fixed now.
llvm/lib/Transforms/IPO/OpenMPOpt.cpp | ||
---|---|---|
249 | Agreed, the current implementation is broken. I've added a FIXME for it. Looks like we should do some dataflow analysis to figure out the correct range.
Yep, made the assumption explicit in the latest revision. |
Sorry for the delay. I'm usually faster.
llvm/lib/Transforms/IPO/OpenMPOpt.cpp | ||
---|---|---|
210 | asserts need a message. | |
230 | In this patch or a follow up we need to do this for the other loop init runtime calls as well (there are 4 or 8 I think). | |
249 | No "FIXME" for correctness issues please. "FIXME" is fine for required improvements though. We should have a test with a FIXME as well to show what is missing. | |
311 | Style: I would prefer if (is..) { changed = true; ... } as it makes the code easier to read (for me). | |
313 | This will not count the right thing if the first loop is annotated but no other one. |
- Fixes correctness issues in calculating NumLoopsAnnotated
- Adds an error message to assert
llvm/lib/Transforms/IPO/OpenMPOpt.cpp | ||
---|---|---|
249 | The "FIXME" was meant to be handled before getting this merged in. (this is in process of being solved, will ping once I've put in the solution) |
llvm/lib/Transforms/IPO/OpenMPOpt.cpp | ||
---|---|---|
245 |
I'm sorry, the comment is a bit misleading. I mean't that -- we support adding ranges to a simple but widely occurring case. In my experience, any pragmas that are translated through clang should have OMPLBVal/OMPUBVal as allocas. Will rephrase it. |
- Corrects the logic behind getting the ranges of stores to omp.(ub/lb).
- Corrects the comment explaining we only support adding range to cases when omp.(ub/lb) are allocas.
llvm/lib/Transforms/IPO/OpenMPOpt.cpp | ||
---|---|---|
230 | I would prefer I do that in a followup patch. It would be easier if we keep this patch only for checking the correctness of logic behind setting the range of bounds. |
The entire potentially reachable thing breaks down if you put the omp for into another loop, doesn't it? Everything becomes potentially reachable from everything.
The situation you are after is, as far as I understand, as follows:
store x in UB call __kmpc_for_init_4(...) v = load UB store min(v, x) in UB
correct?
Two alternative approaches:
- move the min computation (if necessary) into __kmpc_for_init_4 so we don't see it here. All we would see is a single store. (=make sure the call never returns a value higher than the given upper bound) [preferred if plausible]
- match the pattern manually. Two stores, one with a value the other with a select that implements a min. The load flowing into the min needs to (directly) follow the call, the compare etc. So the instruction sequence after the call needs to be
call void @__kmpc_for_static_init_4(%struct.ident_t* nonnull @0, i32 %0, i32 34, i32* nonnull %.omp.is_last, i32* nonnull %.omp.lb, i32* nonnull %.omp.ub, i32* nonnull %.omp.stride, i32 1, i32 1) %1 = load i32, i32* %.omp.ub, align 4 %2 = icmp slt i32 %1, 195 %cond = select i1 %2, i32 %1, i32 195 store i32 %cond, i32* %.omp.ub, align 4
My earlier suggestion with te must-be-executed-context was a generalization of 2). You can also do it more general without the context:
- Both stores are in the same block, on before one after the call, the second one stores the min expression, only the load that flows into the min is between the call and the second store.
llvm/lib/Transforms/IPO/OpenMPOpt.cpp | ||
---|---|---|
286 | I guess this can just be a member function. | |
llvm/test/Transforms/OpenMP/set_bound_ranges.ll | ||
3 | manually run -simplifycfg on this file to get rid of the control flow in favor of selects and then -instcombine to remove redundant loads. |
- Another attempt at making logic, for finding the right loads/stores, general enough to handle all the cases emitted by Clang.
- Now looks for loads/stores to omp.(ub.lb) only in the same basic block as '__kmpc_for_static_init_4'.
- Required slight changes in the OpenMP part of Clang to make sure that it fits this pattern.
- Included a test to check setting the ranges of an OpenMP for-directive that has compile-time constant bounds and is enclosed in an outer loop.
Thanks for taking a look at it!
The entire potentially reachable thing breaks down if you put the omp for into another loop, doesn't it? Everything becomes potentially reachable from everything.
Unfortunately yes, it would've led to not setting the ranges of any loads from omp.(lb/ub).
The situation you are after is, as far as I understand, as follows:
store x in UB call __kmpc_for_init_4(...) v = load UB store min(v, x) in UBcorrect?
Two alternative approaches:
- move the min computation (if necessary) into __kmpc_for_init_4 so we don't see it here. All we would see is a single store. (=make sure the call never returns a value higher than the given upper bound) [preferred if plausible]
The newest patch does some minor changes in Clang to have it covered in our pattern-matching, which turned out to be easier than this. Lmk if you have any problems with it.
llvm/test/Transforms/OpenMP/set_bound_ranges.ll | ||
---|---|---|
3 | Running these passes might not check the true IR fed from Clang to -openmpopt, which is the use case we are shooting for. |
clang/lib/CodeGen/CGStmtOpenMP.cpp | ||
---|---|---|
2739 ↗ | (On Diff #252958) | How did this help? There must be test changes in clang, right? |
llvm/lib/Transforms/IPO/OpenMPOpt.cpp | ||
256 | Checking via operands of instructions that are skipped doesn't work in general. You could have a bitcast or anything that "hides" the pointer even though it is used. I think we should be more conservative as this is a pattern. Only skip loads and stores and only if the loaded/stored pointer is an alloca different from what what we look at. Everything else is a reason to bail |
clang/lib/CodeGen/CGStmtOpenMP.cpp | ||
---|---|---|
2739 ↗ | (On Diff #252958) | All the loads to %omp.(lb,ub), would now lie in the same BB as kmpc_for_static_init_4, fitting our pattern in OpenMPOpt.
Possibly, I should take a look on that front. (If we are looking for an easier pattern to match we would've incurred this problem one way or the other.) |
We should add a statistics tracker for annotated loops.