This is an archive of the discontinued LLVM Phabricator instance.

[Canonicalize] add options to control the region simplification with finer grain
AbandonedPublic

Authored by clementval on Sep 10 2021, 1:19 AM.

Details

Summary

This allows C++ clients of the Canonicalize pass to specify their own
Config option struct to control how region simoplification works.

This is useful for testing and for playing with the options. We have a use
case in FIR at the moment. This is adding finer grain options after the general
one added on D103069.

Diff Detail

Event Timeline

clementval created this revision.Sep 10 2021, 1:19 AM
clementval requested review of this revision.Sep 10 2021, 1:19 AM
mehdi_amini accepted this revision.Sep 10 2021, 12:23 PM

In general I'm not fond of the "hyper configurability" (increase cost of maintenance, harder to evolve, etc.), but this seems in line with the precedent with the GreedyRewriteConfig.

This revision is now accepted and ready to land.Sep 10 2021, 12:23 PM

I share the concern about hyper configurability, and I'm mildly opposed to this patch based on that. I'd love for @rriddle to weigh in when he is back.

If we decide we want this, then these options should be plumed through to the canonicalize pass options (like the rest of GreedyRewriteConfig is) and there should be a testcase that exercises these new things.

lattner requested changes to this revision.Sep 10 2021, 12:59 PM
This revision now requires changes to proceed.Sep 10 2021, 12:59 PM

I was trying to stay consistent with D102988 here, the options were plugged on the command line in D103069 but both of these didn't include testing.

For context, this comes from a discord thread https://discord.com/channels/636084430946959380/642426447167881246/885484565681041479 where it looks like some of the block merging behavior is undeisrable.

I don't have a discord acct, so I can't see the thread, but it sounds like potentially bigger issues with the design of the IR they're working on. I'd love to learn more about the issues

I don't have a discord acct, so I can't see the thread, but it sounds like potentially bigger issues with the design of the IR they're working on. I'd love to learn more about the issues

This is in the context of FIR. We have the following ops before the merge. op1 holds some constant information needed later in the translation of fir.coordinate_of

^bb1:
  %val1 = fir.op1 // op with some constant information
  %val2 = fir.coordinate_of %val1 // during the translation we need the info from %val1

After the merge we have the following were %val1 is not directly accessible with smth like getDefiningOp() or so. Maybe a long term solution would be to have some kind of trait on operation like op1 and make the merge block algorithm that these kind of value should not become block argument. I'm not sure if ConstantLike would already do that with the current algorithm.

%val1 = fir.op1 // op with some constant information
...
cond_br %17, ^bb1(%val1 : !fir.field), ^bb2
^bb1(%18: !fir.field):
  %val2 = fir.coordinate_of %18

I tried to summarize our problem but let me know if this is not clear and I can provide a bigger and more complete example.

ftynse added a comment.EditedSep 11 2021, 5:45 AM

it sounds like potentially bigger issues with the design of the IR they're working on.

+1, this was also my reaction in the thread.

That being said, I found this configurability extension consistent with the previous changes to the rewriter, same as Mehdi.

it sounds like potentially bigger issues with the design of the IR they're working on.

+1, this was also my reaction in the thread.

That being said, I found this configurability extension consistent with the previous changes to the rewriter, same as Mehdi.

In long term we would like to enable the block merging again. I think the options are nice to have to locate, test and resolve potential problems.

ftynse added a subscriber: silvas.Sep 11 2021, 10:06 AM

it sounds like potentially bigger issues with the design of the IR they're working on.

+1, this was also my reaction in the thread.

That being said, I found this configurability extension consistent with the previous changes to the rewriter, same as Mehdi.

In long term we would like to enable the block merging again. I think the options are nice to have to locate, test and resolve potential problems.

I can second the debugability argument here. There is also a feature request by @silvas to be able to disable folding in the rewriter.

Ok, I'm not going to stand in the way of this if others think it is a good idea - I agree that this has very low practical complexity cost. I'd still love to know what River thinks when he comes back.

bondhugula added inline comments.
mlir/include/mlir/Transforms/RegionUtils.h
20–31

While we obviously don't need to test various (2^3) combinations, we should add a test case at least for the "everything disabled" scenario via -canonicalize, i.e., something like:

-canonicalize='enable-region-simplifications=0'.

clementval added inline comments.Sep 13 2021, 12:43 AM
mlir/include/mlir/Transforms/RegionUtils.h
20–31

Sure I'll add a test.

Ok, I'm not going to stand in the way of this if others think it is a good idea - I agree that this has very low practical complexity cost. I'd still love to know what River thinks when he comes back.

Sure. There is no rush on this so we can wait for River's feedback.

clementval added inline comments.Sep 13 2021, 7:51 AM
mlir/include/mlir/Transforms/RegionUtils.h
20–31

@bondhugula It seems that the original option doesn't work right now. The block in the small snippet below is always merged.

func @mismatch_operands(%cond : i1, %arg0 : i32, %arg1 : i32) -> i32 {
  cond_br %cond, ^bb1, ^bb2

^bb1:
  return %arg0 : i32
^bb2:
  return %arg1 : i32
}
mlir-opt -allow-unregistered-dialect -pass-pipeline='func(canonicalize{region-simplify=false})'
mlir-opt -allow-unregistered-dialect -pass-pipeline='func(canonicalize{region-simplify=true})'
mlir-opt -allow-unregistered-dialect --canonicalize="region-simplify=true"
mlir-opt -allow-unregistered-dialect --canonicalize="region-simplify=false"

The two lines give the same result. Is there smth I'm not seeing here?

t seems that the original option doesn't work right now. The block in the small snippet below is always merged.

I didn't understand. You aren't able to control the region simplification?!

t seems that the original option doesn't work right now. The block in the small snippet below is always merged.

I didn't understand. You aren't able to control the region simplification?!

The command line option available on main doesn't change the behavior of the canonicalizer. The config class does.

Do we know when @rriddle is coming back?

If possible, I'd rather not go down the route of adding a config to simplifyRegions. Ideally, everything inside should be generally applicable and performant (still need a few fixes here IIRC). It seems like the problems arise with performing analyses after blocks have been merged, but it feels a tad fragile to rely on blocks not being merged, or more specifically in this case (I think) values not being block arguments. Are there ways we can make such analyses easier? I've seen these problems occur (asserting that information is coming from an operation instead of a block argument) outside of block merging, so it seems like a more general IR analysis issue.

I could see the options being exposed for debuggability, but I wouldn't encourage users to rely on these configs (I mean, I'd also love to kill some of the config options for the canonicalizer as well). For debugging these things, I'd likely use debug actions(+debug counters) instead of a config, https://mlir.llvm.org/docs/DebugActions/, as its easier to track down problems that way.

(My thoughts above stem from seeing compilers with an explosion of config options, which made those systems difficult to reason about. I'd like for MLIR to avoid that if possible, especially for the core things like canonicalization)

If possible, I'd rather not go down the route of adding a config to simplifyRegions. Ideally, everything inside should be generally applicable and performant (still need a few fixes here IIRC). It seems like the problems arise with performing analyses after blocks have been merged, but it feels a tad fragile to rely on blocks not being merged, or more specifically in this case (I think) values not being block arguments. Are there ways we can make such analyses easier? I've seen these problems occur (asserting that information is coming from an operation instead of a block argument) outside of block merging, so it seems like a more general IR analysis issue.

I could see the options being exposed for debuggability, but I wouldn't encourage users to rely on these configs (I mean, I'd also love to kill some of the config options for the canonicalizer as well). For debugging these things, I'd likely use debug actions(+debug counters) instead of a config, https://mlir.llvm.org/docs/DebugActions/, as its easier to track down problems that way.

(My thoughts above stem from seeing compilers with an explosion of config options, which made those systems difficult to reason about. I'd like for MLIR to avoid that if possible, especially for the core things like canonicalization)

Would there be a way to prevent ops holding constant information (required when translating to llvm ir) being transformed as block argument? Or are you suggesting we need to maybe merge our ops so the whole information in always contains in the last op and then doesn't require to find the defining op anymore?

So what is the way forward on this? As previously mentioned, we would be glad to find a solution where we can keep the merge identical block ON if possible. Might be easier when we are more advanced with the upstreaming of FIR.

Would there be a way to prevent ops holding constant information (required when translating to llvm ir) being transformed as block argument?
Or are you suggesting we need to maybe merge our ops so the whole information in always contains in the last op and then doesn't require to find the defining op anymore?

I think that is likely the suggestion here, because ops taking SSA value aren't really "ops holding constant information" in themselves.

Would there be a way to prevent ops holding constant information (required when translating to llvm ir) being transformed as block argument?
Or are you suggesting we need to maybe merge our ops so the whole information in always contains in the last op and then doesn't require to find the defining op anymore?

I think that is likely the suggestion here, because ops taking SSA value aren't really "ops holding constant information" in themselves.

The op that takes the SSA value doesn't hold constant information but the defining op of the SSA value does. Anyway I think it will still be useful to be able to avoid merging in some cases.

Ping. Can we have a final word whether the options can be added or not regardless of the potential IR problem that we might have. Sounds like they were comments for and comments against it but no final decision. If this is rejected, we have to find another solution to move forward in the short term.

mehdi_amini added a comment.EditedOct 6 2021, 7:57 PM

Named ping may be more effective here: @lattner @rriddle can you chime in and help @clementval here?

(I think this is on their path to upstream FIR code right now)

(I think this is on their path to upstream FIR code right now)

Right, we can apply the temporary fix in fir-dev right now but we will face the problem again when we finish the upstreaming. Once the upstreaming is complete it might also be easier to expose the problem we face and find a better solution than disabling canonicalizer simplification.

lattner resigned from this revision.Oct 7 2021, 2:19 PM

I'm resigning as a reviewer because I think taht's the only way to clear my "some changes required" bit in Phab. My opinion is that this is showing a bigger design problem with the dialect, and that these flags are adding parameterization that shouldn't be needed. That said, the patch is structured fine and if others think it is valuable, then I don't want to be in the way of progress here.

This revision is now accepted and ready to land.Oct 7 2021, 2:19 PM

I'll restate this case again in terms of an LLVM IR instruction, which if targeted, could exhibit the problem. We're a bit surprised no one else has run into this problem with lowering to LLVM IR.

Let's say we have some MLIR dialect op that we want to translate, in whole or in part, to

<result> = shufflevector <n x <ty>> <v1>, <n x <ty>> <v2>, <m x i32> <mask>    ; yields <m x <ty>>

There is a restriction on this op that the third operand be a vector of constants or undefs of type i32. Our hypothetical dialect op, knowing this may create such a vector of constants using ConstantOp Values. This will seemingly work fairly well. However, block merging, if used, may decide to fuse two unrelated vectors of constants into block arguments of values.

At that point, one can no longer use the LLVM shufflevector instruction as it strictly requires constants.

In LLVM, a Constant is intrinsically hashconsed to be a unique Value and shared by users. They are not "merged" by placing phi nodes in the IR, as far as I know. The semantics in MLIR is slightly different for a ConstantOp. A ConstantOp is not shared, not hashcons unique, and treated exactly as any other operation that computes an ssa-value. Thus they are merged through block arguments and can even be mixed freely with computed values (non-constants).

In MLIR, there are two ways of specifying constants that I am aware of. ConstantOps, which have the above problems, and Attributes of constant data. Using ConstantOps is a clear win in a dialect design as it eliminates the need to factor an operation along the lines: some arguments must be values, other arguments can be constants or values, and yet others have to be constants. The latter case can neatly fall under Attributes, of course, but the middle case might require 2^n variations to implement if the high-level op has n arguments that may or may not be constants.

Of course, one of the core points of having a "high-level" op is, IMHO, to abstract away from implementation details.

By way of example then, our hypothetical dialect op may have to take multiple special forms.

%100 = xyz.foo %90, %92, %94, %95          // maybe can use shufflevector, but may use a different pattern even when it is unnecessary
%101 = xyz.foo %90, %92, [1 : i32, 2: i32]   // can use shufflevector

As already stated, this might result in an exponential set of cases (or distinct dialect ops) in the worst case, depending on what the op's semantics are. The end result being that using MLIR's ConstantOp becomes unattractive and something to avoid.

 %102 = xyz.foo.alt1 [4 : i64], %92, [1 : i32, 2: i32]
 %103 = xyz.foo.alt2 %90, [2.0 : f32, 4.0 : f32], [1 : i32, 2: i32]
 %104 = xyz.foo.alt3 [5 : i64, 10 : i64], [2.0 : f32, 4.0 : f32], [1 : i32, 2: i32]
 %105 = xyz.foo.alt4 [5 : i64, 10 : i64], [2.0 : f32, 4.0 : f32], %97, %97
...

You're making good point here and we sometimes struggled between the "AttributeOrValue" aspect.
However sometimes that you're not making clear here I feel is the "optimization vs semantics" aspect: fundamentally the "ConstantOp" design materialize a constant into an SSA value so that it can be used in a dynamic context.
In this sense the constant op addresses the 2^n problem you mentioned: everything is an SSA value, and pattern matching a constant is only an optimization.
But that requires to acknowledge that %100 = xyz.suffle %90, %92 may not be able to lower to the same construct if %92 isn't matching to a constant: it has to be an optimization.
From this point of view, the block-merging blocks this optimization but isn't a correctness problem, the way I see it is that in some case you want the opposition transformation, by analogy this is similar what things like function specialization is doing.

On the other hand the attributes are inherently non-dynamic and allow to make a constant integral part of the op definition.

Looks like it is now accepted. I'll keep it open until tomorrow just to be sure there is no last minute blocker.

rriddle requested changes to this revision.Oct 11 2021, 6:35 PM

I'm still not really on board with adding options like this, and I'd prefer we find a way to resolve without going that route. I don't think this is the right long term direction for the canonicalizer. I think part of the problems had here tie in with the insights that Mehdi mentions in his post, and I'd rather continue down that line of discussion.

This revision now requires changes to proceed.Oct 11 2021, 6:35 PM
clementval added a comment.EditedOct 12 2021, 5:15 AM

I'm still not really on board with adding options like this, and I'd prefer we find a way to resolve without going that route. I don't think this is the right long term direction for the canonicalizer. I think part of the problems had here tie in with the insights that Mehdi mentions in his post, and I'd rather continue down that line of discussion.

Ok, I guess for the time being we can live without those options if they are problematic to add. Since it was following exactly what D103069 did, I didn't expect it would be a problem. We will check internally what we can do here. Anyway, would you mind if we expose the functions used in simplifyRegions (eraseUnreachableBlocks, runRegionDCE and mergeIdenticalBlocks) in RegionUtils.h? Not that we do not want to find a solution where the full region simplification is enable but I think it would still be valuable to be able to run those separately.

@rriddle Whould that be ok to expose the functions mentioned in my previous comment. We are also looking at another solution where the full region simplification will be enable.

@rriddle Whould that be ok to expose the functions mentioned in my previous comment. We are also looking at another solution where the full region simplification will be enable.

Ping @rriddle

I'm still not really on board with adding options like this, and I'd prefer we find a way to resolve without going that route. I don't think this is the right long term direction for the canonicalizer. I think part of the problems had here tie in with the insights that Mehdi mentions in his post, and I'd rather continue down that line of discussion.

Ok, I guess for the time being we can live without those options if they are problematic to add. Since it was following exactly what D103069 did, I didn't expect it would be a problem. We will check internally what we can do here. Anyway, would you mind if we expose the functions used in simplifyRegions (eraseUnreachableBlocks, runRegionDCE and mergeIdenticalBlocks) in RegionUtils.h? Not that we do not want to find a solution where the full region simplification is enable but I think it would still be valuable to be able to run those separately.

What would we be exposing them for? I could also see a case for moving mergeIdenticalBlocks to CSE (not sure if that would break anything), given that seems more logically related than canonicalize.

What would we be exposing them for? I could also see a case for moving mergeIdenticalBlocks to CSE (not sure if that would break anything), given that seems more logically related than canonicalize.

We would like to use these functions inside a pass decoupled from the rest.

@rriddle So what do you think about it?

@rriddle So what do you think about it?

Sorry, keeping missing this in my queue. Alright. If they are treated as separable utilities that should be fine. The only concern I would have is if we created a "canonicalize, but not blah" pass that worked around something. That may be fine for a temporary thing, but I don't think that would be ideal.

clementval abandoned this revision.Nov 18 2021, 6:35 AM

@rriddle So what do you think about it?

Sorry, keeping missing this in my queue. Alright. If they are treated as separable utilities that should be fine. The only concern I would have is if we created a "canonicalize, but not blah" pass that worked around something. That may be fine for a temporary thing, but I don't think that would be ideal.

No worries. The goal is not to work around something as I mentioned previously. We will solve the initial problem we described in this patch. In the mean time having the possibility to use these function as separate utility is very useful. I made D114160 for this. I'll close this patch since we are not going this way anymore.