This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Add !range metadata to loads from omp.(ub/lb)
Needs ReviewPublic

Authored by kaushikcfd on Mar 10 2020, 12:23 AM.

Details

Reviewers
jdoerfert
Summary

After invoking kmpc_for_static_init_4, values in omp.ub and omp.lb are bound by the bounds of the sequential loop. This information is passed to the IR by setting the !range metadata of the corresponding variables.

Diff Detail

Event Timeline

kaushikcfd created this revision.Mar 10 2020, 12:23 AM

This looks pretty good already. I inlined some comments. We also need negative tests, e.g., with non-constant loop bounds.

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
48

We should add a statistics tracker for annotated loops.

107

The documentation here needs improvement. If the callback returns true the use will be deleted. Could you augment the documentation and make sure you don't cause uses to be removed from the cache.

220

We still use capital first letters for characters, thus SetRangeCB and StaticForCall, ...

223

We need names for these "magic constants" maybe some globals like this:

static constexpr unsigned KMPC_FOR_STATIC_INIT_LB = 4;
static constexpr unsigned KMPC_FOR_STATIC_INIT_UB = 5;
226

Nit: I somehow would prefer nullptr instead of NULL.

251

You need to verify there is only a single load/store and the value is an alloca without initializer and the type is as expected.

280

Style: Early exists are easier to read, so:

if (cond) {
  DEBUG(...);
  return;
}
llvm/test/Transforms/OpenMP/set_bound_ranges.ll
101

I think you can remove the attributes and metadata above.

Post review amends:

  • Doesn't delete the use after adding the range
  • Added a negative test
  • Satisfies Linter
kaushikcfd marked 8 inline comments as done.Mar 10 2020, 9:59 AM
kaushikcfd added inline comments.
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
251

verify there is only a single load/store

There might be multiple loads/store to %omp.ub arising from EnsureUpperBound. So, we only pick the store just before and the load just after the static_for invocation.

value is an alloca without initializer

Is the "value" begin discussed here V?

sstefan1 added a comment.EditedMar 10 2020, 10:30 AM

Seems like you uploaded a diff against your previous patch version. Also changes from OMPKinds.def are missing. Can you fix that?

sstefan1 added inline comments.Mar 10 2020, 10:46 AM
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
224

for (const Use &U : BoundVal->uses()) seems nicer.

jdoerfert added inline comments.Mar 10 2020, 11:20 AM
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
47

-for
+worksharing

208

no else after return

213

Either move these dyn_cast prior to the conditional and branch on Low and HighMinusOne (which are nullptr if they are not constant ints), or use cast here because you had a isa before. I would probably do the former.

221

We should check that this is better than the existing metadata.

223

move them into the class or global scope.

224

Right. And if you only want the users: users().

251

If there are multiple, how do you ensure you pick up the right one? We probably need to find all and use the upper/lower bound of all stored values to be correct. We also need a test for multiple loads/stores.

I mean we assume OMPLBVal and OMPUBVal are alloca's, right? We need to make sure we only pattern match a situation we can "control"

Fixes the "updated" diff for the previous version.

Seems like you uploaded a diff against your previous patch version

Thanks @sstefan1, indeed I confused updated diff as being the iterative update. Should be fixed now.

kaushikcfd marked 7 inline comments as done.

Address review comments. (thanks @jdoerfert , @sstefan1!)

kaushikcfd marked an inline comment as done.Mar 10 2020, 5:07 PM
kaushikcfd added inline comments.
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
251

Agreed, the current implementation is broken. I've added a FIXME for it. Looks like we should do some dataflow analysis to figure out the correct range.

I mean we assume OMPLBVal and OMPUBVal are alloca's, right?

Yep, made the assumption explicit in the latest revision.

Sorry for the delay. I'm usually faster.

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
212

asserts need a message.

232

In this patch or a follow up we need to do this for the other loop init runtime calls as well (there are 4 or 8 I think).

251

No "FIXME" for correctness issues please. "FIXME" is fine for required improvements though.
So, the case we can handle correctly now we should. We should also detect if it is not one of them and give up, e.g., if there are more than a single load and store. I recommend just tracking all uses, determining if it is a load of the pointer or a store *to the pointer* (not of it), putting them in respective sets, and bailing on other uses. Then, inspect the loads/stores, for now just bail if the number is not one for each.

We should have a test with a FIXME as well to show what is missing.

313

Style: I would prefer if (is..) { changed = true; ... } as it makes the code easier to read (for me).

315

This will not count the right thing if the first loop is annotated but no other one.

kaushikcfd marked 4 inline comments as done.
  • Fixes correctness issues in calculating NumLoopsAnnotated
  • Adds an error message to assert
kaushikcfd marked an inline comment as not done.Mar 13 2020, 8:07 AM
kaushikcfd added inline comments.
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
251

The "FIXME" was meant to be handled before getting this merged in. (this is in process of being solved, will ping once I've put in the solution)

jdoerfert added inline comments.Mar 13 2020, 9:24 AM
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
247

Style: I would move the comment before the if or add braces.

I'm curious, is this a widely occurring case? Do you know when it occurs in practice?

251

Sounds good :) I will monitor the reviews more closely now anyway.

kaushikcfd added inline comments.Mar 13 2020, 9:53 AM
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
247

I'm curious, is this a widely occurring case? Do you know when it occurs in practice?

I'm sorry, the comment is a bit misleading. I mean't that -- we support adding ranges to a simple but widely occurring case. In my experience, any pragmas that are translated through clang should have OMPLBVal/OMPUBVal as allocas. Will rephrase it.

  1. Corrects the logic behind getting the ranges of stores to omp.(ub/lb).
  2. Corrects the comment explaining we only support adding range to cases when omp.(ub/lb) are allocas.
kaushikcfd marked 3 inline comments as done.Mar 15 2020, 12:58 PM
kaushikcfd added inline comments.
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
232

I would prefer I do that in a followup patch. It would be easier if we keep this patch only for checking the correctness of logic behind setting the range of bounds.

The entire potentially reachable thing breaks down if you put the omp for into another loop, doesn't it? Everything becomes potentially reachable from everything.
The situation you are after is, as far as I understand, as follows:

store x in UB
call __kmpc_for_init_4(...)
v = load UB
store min(v, x) in UB

correct?

Two alternative approaches:

  1. move the min computation (if necessary) into __kmpc_for_init_4 so we don't see it here. All we would see is a single store. (=make sure the call never returns a value higher than the given upper bound) [preferred if plausible]
  2. match the pattern manually. Two stores, one with a value the other with a select that implements a min. The load flowing into the min needs to (directly) follow the call, the compare etc. So the instruction sequence after the call needs to be
call void @__kmpc_for_static_init_4(%struct.ident_t* nonnull @0, i32 %0, i32 34, i32* nonnull %.omp.is_last, i32* nonnull %.omp.lb, i32* nonnull %.omp.ub, i32* nonnull %.omp.stride, i32 1, i32 1)
%1 = load i32, i32* %.omp.ub, align 4
%2 = icmp slt i32 %1, 195
%cond = select i1 %2, i32 %1, i32 195
store i32 %cond, i32* %.omp.ub, align 4

My earlier suggestion with te must-be-executed-context was a generalization of 2). You can also do it more general without the context:

  • Both stores are in the same block, on before one after the call, the second one stores the min expression, only the load that flows into the min is between the call and the second store.
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
288

I guess this can just be a member function.

llvm/test/Transforms/OpenMP/set_bound_ranges.ll
3

manually run -simplifycfg on this file to get rid of the control flow in favor of selects and then -instcombine to remove redundant loads.

  • Another attempt at making logic, for finding the right loads/stores, general enough to handle all the cases emitted by Clang.
    • Now looks for loads/stores to omp.(ub.lb) only in the same basic block as '__kmpc_for_static_init_4'.
    • Required slight changes in the OpenMP part of Clang to make sure that it fits this pattern.
  • Included a test to check setting the ranges of an OpenMP for-directive that has compile-time constant bounds and is enclosed in an outer loop.
kaushikcfd marked 2 inline comments as done.Mar 26 2020, 1:00 PM

Thanks for taking a look at it!

The entire potentially reachable thing breaks down if you put the omp for into another loop, doesn't it? Everything becomes potentially reachable from everything.

Unfortunately yes, it would've led to not setting the ranges of any loads from omp.(lb/ub).

The situation you are after is, as far as I understand, as follows:

store x in UB
call __kmpc_for_init_4(...)
v = load UB
store min(v, x) in UB

correct?

Two alternative approaches:

  1. move the min computation (if necessary) into __kmpc_for_init_4 so we don't see it here. All we would see is a single store. (=make sure the call never returns a value higher than the given upper bound) [preferred if plausible]

The newest patch does some minor changes in Clang to have it covered in our pattern-matching, which turned out to be easier than this. Lmk if you have any problems with it.

llvm/test/Transforms/OpenMP/set_bound_ranges.ll
3

Running these passes might not check the true IR fed from Clang to -openmpopt, which is the use case we are shooting for.

jdoerfert added inline comments.Mar 27 2020, 10:52 AM
clang/lib/CodeGen/CGStmtOpenMP.cpp
2739 ↗(On Diff #252958)

How did this help? There must be test changes in clang, right?

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
258

Checking via operands of instructions that are skipped doesn't work in general. You could have a bitcast or anything that "hides" the pointer even though it is used. I think we should be more conservative as this is a pattern. Only skip loads and stores and only if the loaded/stored pointer is an alloca different from what what we look at. Everything else is a reason to bail

kaushikcfd added inline comments.Mar 27 2020, 11:12 AM
clang/lib/CodeGen/CGStmtOpenMP.cpp
2739 ↗(On Diff #252958)

All the loads to %omp.(lb,ub), would now lie in the same BB as kmpc_for_static_init_4, fitting our pattern in OpenMPOpt.

There must be test changes in clang, right?

Possibly, I should take a look on that front. (If we are looking for an easier pattern to match we would've incurred this problem one way or the other.)