Page MenuHomePhabricator

[mlir][ControlFlow] Add --strip-asserts pass
AbandonedPublic

Authored by springerm on Nov 18 2022, 4:27 AM.

Details

Summary

This pass removes cf.assert ops (for release builds). Also clarify the purpose and usage of this op in the documentation.

Diff Detail

Event Timeline

springerm created this revision.Nov 18 2022, 4:27 AM
springerm requested review of this revision.Nov 18 2022, 4:27 AM
silvas added a subscriber: herhut.EditedNov 18 2022, 4:53 AM

LGTM, but let's wait for somebody outside our chat thread to weigh in.

@herhut any thoughts about the cf.assert spec change? I think this is how the lower levels of the stack intended to use this, but wanted some extra feedback.

mlir/include/mlir/Dialect/ControlFlow/IR/ControlFlowOps.td
49

nit: close backtick

Also, we should probably say something like

... with the `cf-strip-asserts` pass.

The pass name technically does not include the -- and we need to namespace it to the cf dialect.

mlir/include/mlir/Dialect/ControlFlow/Transforms/Passes.td
14

Does this have to be a function pass? I can imagine us using it in e.g. IREE when we have a more complex module/etc structure later in the pipeline. I think it can be a generic operation pass like canonicalize.

mehdi_amini requested changes to this revision.Nov 18 2022, 9:13 AM
mehdi_amini added inline comments.
mlir/include/mlir/Dialect/ControlFlow/IR/ControlFlowOps.td
48

This is a big change in semantics, I would rather use another op with a less ambiguous naming.

I'm not sure how to name this though, it is quite unexpected to me to have an op in the IR where the semantics can change based on "release build or not". I would also question if this is the right dialect for this, seems like you think of "assert" as in C/C++ preprocessor macro, which I see as a "frontend feature". "Assert" to me is entirely disconnected from this.

This revision now requires changes to proceed.Nov 18 2022, 9:13 AM
silvas added inline comments.Nov 18 2022, 9:24 AM
mlir/include/mlir/Dialect/ControlFlow/IR/ControlFlowOps.td
48

From my knowledge of the ecosystem I think most people were already assuming the semantics documented here because of the name's close connection with the C assert macro and similar concepts like the assert keyword in Python. And in fact were very confused when I told them it had the semantics documented before this patch.

I support keeping it with the new name and updated documented semantics for that reason.

Having an op in the IR with "release build or not" semantics does not sound wrong to me -- e.g. an MLIR-powered clang would probably want to create an op with that behavior, so that it could e.g. understand assert(false) as a replacement for __builtin_unreachable for control-flow-based warnings.

Also adding @ezhulenev who has systems that use this op

mehdi_amini added inline comments.Nov 18 2022, 9:32 AM
mlir/include/mlir/Dialect/ControlFlow/IR/ControlFlowOps.td
48

From my knowledge of the ecosystem I think most people were already assuming the semantics documented here because of the name's close connection with the C assert macro and similar concepts like the assert keyword in Python. And in fact were very confused when I told them it had the semantics documented before this patch.

People coming with different background and making wrong assumption and not reading documentation is something that will always exist unfortunately. This is a weak argument for changing the semantics, at best finding a different name maybe.
While I agree that we aim to make everything self-explanatory, naming in CS is always a problem and has a lot of collision with different meaning across different domain or level of the stack.

Having an op in the IR with "release build or not" semantics does not sound wrong to me -- e.g. an MLIR-powered clang would probably want to create an op with that behavior, so that it could e.g. understand assert(false) as a replacement for __builtin_unreachable for control-flow-based warnings.

Yes: but this would be a frontend dialect implementing frontend semantics. The cf dialect is a lower level dialect from this point of view.

(also strictly speaking, I don't think C/C++ spec allows you to evaluate the assert expression when asserts are disabled, the user could be creative and log something there for example).

In xla we use assert as an error check like in this example:

%bool = call @foo()
assert %bool != false, "Oops, foo failed"

and rely on later lowering to a proper control flow:

%bool = call @foo()
cf.cond_br %bool, ^failed-assert, ^all-good
^failed-assert:
   call @runtimeSetError("Oops foo failed")

Also async dialect safely converts failed assertions to async.token and async.value errors.

But given that createStripAssertsPass not magically enabled in opt builds for all pipelines, it's ok, we'll just not be adding it. Inside xla I'd probably replace some assert ops with xla.check operation that has xla-specific semantics. And stripping asserts inside async functions/regions seems legit "release build optimization".

jpienaar added inline comments.
mlir/include/mlir/Dialect/ControlFlow/IR/ControlFlowOps.td
48

I don't see IR here as representing release build or not. The op itself should have well defined semantics without considering external/non-IR captured context. In whatever release pipeline a given project constructs they can decide to add a pass to elide them (which could be this or another one that more aggressively deletes side effecting ops that only feed into asserts). Thats a property of that project and that pipeline IMHO and not of this op. E.g., I could decide print functions are for debugging only and elide them in my release build, but that doesn't change the semantics of print op either.

In this case this op may abort. One can run the pass to remove ops (which does not define the semantics of this op and should not be part of the description of this op), at that point all instances of this op has been deleted and its semantics doesn't matter. But that's the user that created a pipeline and decided that its acceptable to assume all assertions hold and need not be verified at runtime.

Agreed with Mehdi re naming and level.

52

This seems way too vague. "It may capture something sometime for someone that should be retained past some point". E.g., "don't optimize addi as we use it to identify induction variable" is "fine" for a given pipeline, it's not a property of the addi op and not something that should be pushed on general users. Most folks should use a loop construct instead rather than having such things in the more general parts (e.g., keep to frontend cleanup). Similarly here.

And again not semantics of the op, but some consideration of a specific pipeline/application that only uses this op. So shouldn't be documented on this op, but rather on the pipeline ("no asserts dropped until here").

seems like you think of "assert" as in C/C++ preprocessor macro, which I see as a "frontend feature".

Yes, a C++ assert is what I had in mind.

A concrete use case: memref.collapse_shape does not support collapsing non-contiguous dims. This can be checked in the verifier in case of static sizes/strides. In the dynamic case, we should generate some generate some kind of assert. At the moment, we just continue with the execution and probably crash later. Or even worse: Just compute a wrong result. This is really hard to debug. But we likely don't need/want these assertions in a release build. In this example, the assert is not a "frontend feature".

There may actually be a simple alternative for my use case that I did not think of earlier: We could have a runtimeAssert(Value) helper function in LLVMDialect which emits IR that aborts if the given i1 Value is 0. And a new mlir-opt flag controls if this function is a no-op or actually emits the check+abort. (Or should this be a Convert...ToLLVM pass option? But then we may need it on multiple passes.)

silvas added inline comments.Nov 21 2022, 3:23 AM
mlir/include/mlir/Dialect/ControlFlow/IR/ControlFlowOps.td
48

I don't consider this kind of monkey patching the pipeline to be good compiler design practice. The semantics of the program should be fixed by the frontend. We should not be relying on backends randomly dropping certain semantics for convenience.

We need two ops:

  1. "this op must absolutely terminate the program" and you are miscompiling if you elide it
  2. "this op is allowed to terminate the program, but not required to (with the implication that the termination might be useful for debugging/etc.)"

Only users (i.e. frontends) can "bless" the removal of an assertion by choosing op 2. I do not want the semantics of the IR produced by Torch-MLIR to depend on how a downstream happens to configure their pipeline. They simply do not have enough information to be able to do that correctly. For example, if a user writes if cond(): abort() I might want to lower that to cf.assert but the user may wants that treated separately from a shape check.

The current state is the some people are using the op for 1., and others for 2. we need to resolve that. I do not think it makes sense to add a pass that strips cf.assert if cf.assert has semantics 1.

springerm added a comment.EditedNov 21 2022, 3:36 AM

There may actually be a simple alternative for my use case that I did not think of earlier: We could have a runtimeAssert(Value) helper function in LLVMDialect which emits IR that aborts if the given i1 Value is 0. And a new mlir-opt flag controls if this function is a no-op or actually emits the check+abort. (Or should this be a Convert...ToLLVM pass option? But then we may need it on multiple passes.)

Clarification: The memref::CollapseShape to LLVM lowering pattern would directly call runtimeAssert, which generates the branch + call to abort. No cf.assert op is generated. The mlir-opt or pass flag just controls if runtimeAssert is a no-op. It does not affect the lowering of cf.assert.

The mlir-opt/pass flag can be compared to -DNDEBUG of a C/C++ compiler.

The downside of this approach is that runtime asserts can only be generated when lowering to LLVM. Not at a higher abstraction level / earlier in the pipeline. But this would be sufficient for my use case.

Here is the alternative implementation approach that directly generates abort calls when a debugging flag is enabled: D138444

At the moment, we just continue with the execution and probably crash later. Or even worse: Just compute a wrong result. This is really hard to debug. But we likely don't need/want these assertions in a release build. In this example, the assert is not a "frontend feature".

Right, this is what the middle end in general calls "UB".

What you're calling for here as I perceive it is a mode for the optimizer to insert an assertion every time something relying on UB is used, kind of a "UBSAN" mode here.
This is something interesting to consider, but I would see this implemented as pass which would materialize assertions for UB (through an op interface maybe?).

springerm abandoned this revision.Jan 5 2023, 1:26 AM