Page MenuHomePhabricator

[mlir] Fix dumping invalid ops
ClosedPublic

Authored by sgrechanik on Jan 20 2022, 1:28 PM.

Details

Summary

This patch fixes the crash when printing some ops (like affine.for and
scf.for) when they are dumped in invalid state, e.g. during pattern
application. Now the AsmState constructor verifies the operation
first and switches to generic operation printing when the verification
fails. Also operations are now printed in generic form when emitting
diagnostics and the severity level is Error.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
sgrechanik requested review of this revision.Jan 20 2022, 1:28 PM
rriddle requested changes to this revision.EditedJan 20 2022, 1:35 PM

We shouldn't be baking in verification checking into the custom printers. If anything we should have a dump/printer API that falls back to generic in the case of verification failure. It isn't feasible/maintainable to re-encode all of the verifier constraints, so we should keep custom printers clean from things they should be able to assume.

This revision now requires changes to proceed.Jan 20 2022, 1:35 PM

We can call opInfo->verifyInvariants(op) in OperationPrinter::printOperation and switch to generic printing on failure, something along the lines:

if (auto opInfo = op->getRegisteredInfo()) {
  // First verify the operation. If the operation fails to verify then the
  // custom print function may fail and we have to use the generic one.
  {
    // Ignore errors emitted by the verifier.
    ScopedDiagnosticHandler diagHandler(
        op->getContext(), [](Diagnostic &) { return success(); });
    if (failed(opInfo->verifyInvariants(op))) {
      printGenericOp(op, /*printOpName=*/true);
      return;
    }
  }
  opInfo->printAssembly(op, *this, defaultDialectStack.back());
  return;
}

However, doing this exposes some bugs in other ops. I'll try to investigate.

I talked about this with River in the past: I'd call the verifier on the entry point of the printer (the top-level operation to print), which will verify recursively the IR down. And switch to generically printing for the whole IR and not on an op-by-op basis.

sgrechanik retitled this revision from [mlir] Fix printing invalid scf and affine ops to [mlir] Fix dumping invalid ops.
sgrechanik edited the summary of this revision. (Show Details)

Not sure at which level to put the verification, Operation::dump seems to be a good candidate, it will fix the original crash and won't affect other kinds of printing.

rriddle added inline comments.Jan 25 2022, 4:24 PM
mlir/test/Conversion/AffineToStandard/lower-affine.mlir
1–2 ↗(On Diff #403069)

LG, except for these. I'm not sure we should be adding these to the tests. Realistically tests shouldn't be crashing with -debug, but we shouldn't need to add debug individually to all of the tests to verify that.

mehdi_amini requested changes to this revision.Jan 25 2022, 4:29 PM
mehdi_amini added inline comments.
mlir/lib/IR/AsmPrinter.cpp
2989

I think this should belong to the printer and not just "dump()"

The behavior does not seem desirable to be coupled to the llvm::errs() in itself. We don't want to crash when printing in general.
We had to hardcode this in the python bindings, but it fits better the framework itself.

This revision now requires changes to proceed.Jan 25 2022, 4:29 PM

Fixing this not only for dump uncovered a bug in verification of spirv::ConstantOp, which should be fixed first: https://reviews.llvm.org/D118939

jpienaar added inline comments.
mlir/lib/IR/AsmPrinter.cpp
2984

There is (I believe) still a change that an error can be emitted here in multi threaded environment, it would require a new handler to be set after this but before verify below. Not a high probability event though.

sgrechanik updated this revision to Diff 406636.Feb 7 2022, 3:51 PM
sgrechanik marked an inline comment as done and an inline comment as not done.
sgrechanik edited the summary of this revision. (Show Details)
  • Moved the check to Operation::print and Block::print
  • Removed error suppression: errors may be helpful.
mlir/lib/IR/AsmPrinter.cpp
2984

Hmm, yes, you are right, and I don't immediately see how to solve it.

But maybe actually we shouldn't try to suppress error messages here, because without error messages it's hard to tell why the printer switched to generic op printing.

2989

Moved it to Operation::print and Block::print

mlir/test/Conversion/AffineToStandard/lower-affine.mlir
1–2 ↗(On Diff #403069)

Do you have a better idea how to tests this?

mehdi_amini added inline comments.Feb 7 2022, 4:58 PM
mlir/lib/IR/AsmPrinter.cpp
2973

Please run git clang-format on your commit (arc should do it for you if clang-format is in the path I think)

2984

I wouldn't expect diagnostic about verification failure in a place where I call "print" though.

River: is there a safe way to silence these in the context of the printer?

3000

Is this check needed/useful?

Also you checked in the Operation::print is the generic printer flag isn't already set before verifying, should you do the same here?

rriddle added inline comments.Feb 7 2022, 5:06 PM
mlir/lib/IR/AsmPrinter.cpp
2984

Generally if you only want diagnostics for this thread, you should check the result of llvm::get_threadid() within the handler against the one for the main thread. Something like:

auto parentThreadId = llvm::get_threadid();
auto handlerFn = [&](Diagnostic &diag) {
  return success(parentThreadId == llvm::get_threadid());
};
sgrechanik updated this revision to Diff 407009.Feb 8 2022, 4:16 PM
  • Error suppression is back
  • fixed formatting errors
mlir/lib/IR/AsmPrinter.cpp
2984

Thanks, I'll use that.

3000

I inserted this check to avoid messing with diagnostic handling if the block is empty.

Here the flags are always the default ones. AsmState contains its own set of flags, but as far as I understand, they are not used for the actual printing. However, I wonder whether AsmState and OperationPrinter flags mismatch would cause any trouble. Maybe we shouldn't do the verification in functions taking AsmState, and verify in more higher level Operation::print(raw_ostream &os, const OpPrintingFlags &printerFlags) and Block::print(raw_ostream &os) instead?

rriddle added inline comments.Feb 8 2022, 4:29 PM
mlir/lib/IR/AsmPrinter.cpp
3000

It's generally too late to do verification if the AsmState already exists, the construction of that may call back into operations (and thus crash if invalid). I'd just verify in the non-AsmState functions.

Tangentially though, we should just expose the OpPrintingFlags held by the AsmState and use those (same applies for the other function).

sgrechanik added inline comments.Feb 9 2022, 6:15 PM
mlir/lib/IR/AsmPrinter.cpp
3000

Tangentially though, we should just expose the OpPrintingFlags held by the AsmState and use those (same applies for the other function).

Then it probably makes sense to do that and perform verification in the AsmState constructor. It doesn't seem to be a lot of code change.
However, I ran into an issue when a verifier tries to emit an error and print the op, and printing the op requires calling the verifier first, resulting in infinite recursion. I need to think how to solve it, something like temporary changing printOpOnDiagnostic, but there doesn't seem to be a thread-safe way to do it.

Ok, let's do it gradually and expose printing flags in AsmState first: https://reviews.llvm.org/D119870

sgrechanik edited the summary of this revision. (Show Details)

Rebased and moved the verification to the AsmState constructor.

At the python level, we have a flag to assume verified and not reverify. When using systematically, there are often cases where you know verification is successful and can skip doing it again.

We can add that in after this patch if needed, but I feel we should do so.

mehdi_amini added inline comments.Feb 18 2022, 11:33 AM
mlir/lib/IR/AsmPrinter.cpp
1274

Shouldn't the verifier failure be printing in generic mode already which should skip the verifier?
I'm not sure why we need this?

sgrechanik added inline comments.Feb 18 2022, 4:52 PM
mlir/lib/IR/AsmPrinter.cpp
1274

Not always. When the error is emitted like op1.emitError() << op2; then op1 will be printed in generic mode (or not at all), and op2 will be printed with default flags. This actually happens when verifying whether a block has a terminator in mlir/lib/IR/Verifier.cpp:154

Operation &terminator = block.back();
if (!terminator.mightHaveTrait<OpTrait::IsTerminator>())
  return block.back().emitError("block with no terminator, has ")
         << terminator;

And also printing an op may require verification of its parent op because AsmState may be constructed for the parent op.

mehdi_amini added inline comments.Feb 19 2022, 3:05 PM
mlir/lib/IR/AsmPrinter.cpp
1274

When the error is emitted like op1.emitError() << op2; then op1 will be printed in generic mode (or not at all), and op2 will be printed with default flags. This actually happens when verifying whether a block has a terminator in mlir/lib/IR/Verifier.cpp:154

But isn't this something we should fix in the verifier? Seems like unintended to me.

And also printing an op may require verification of its parent op because AsmState may be constructed for the parent op.

Similarly: if we print an op with the generic flag, then constructing the AsmState for the parent op could just inherit the flags again?

In general I'm wary of mutable global state when we can do differently: it's not clear to me still that this isn't just a workaround instead of something that is fundamentally needed if the system is consistent.

mehdi_amini added inline comments.Feb 19 2022, 3:32 PM
mlir/lib/IR/AsmPrinter.cpp
1274

I think we're always gonna hit custom verifier that may print op without propagating flags, I can't see a better option than what you have here right now, so LGTM for now.

mlir/test/Conversion/AffineToStandard/lower-affine.mlir
1–2 ↗(On Diff #403069)

I agree with River: we should keep the testing focused.

what about adding a test with invalid IR and check that the printed op is in generic form?

rriddle added inline comments.Feb 19 2022, 3:39 PM
mlir/lib/IR/AsmPrinter.cpp
1274

We should avoid any kind of global or thread_local state. We should just switch how we handle printing operations in diagnostics, e.g. to just use generic form by default (i.e. change this https://github.com/llvm/llvm-project/blob/8e7995884a6525b44fe2f71318883ba2fec2d972/mlir/lib/IR/Diagnostics.cpp#L126).

sgrechanik added inline comments.Feb 23 2022, 3:36 PM
mlir/lib/IR/AsmPrinter.cpp
1274

We can switch to generic form by default but that will somewhat worsen user experience by making error messages less pretty. I'm ok with it though.

Alternatively we can implement per-thread error suppression in DiagnosticEngine, but that would be a much bigger change.

sgrechanik edited the summary of this revision. (Show Details)
  • Switched to generic printing in diagnostics, removed the thread_local trick
  • Added a flag to skip verification
  • Implemented more explicit testing of invalid op printing
mehdi_amini accepted this revision.Feb 27 2022, 1:49 PM

Please wait for River to have another look!

Thanks :)

rriddle accepted this revision.Feb 28 2022, 3:16 PM

Nice.

mlir/test/Analysis/test-match-reduction.mlir
10

If we wanted, we could allow remarks to use the proper custom format. We would need to add some documentation somewhere about what types of diagnostics are expected from the verifier though.

This revision is now accepted and ready to land.Feb 28 2022, 3:16 PM
sgrechanik marked an inline comment as not done.Mar 1 2022, 5:44 PM
sgrechanik added inline comments.
mlir/test/Analysis/test-match-reduction.mlir
10

Makes sense, may be worth doing it in this review to avoid changing tests back and forth.

Herald added a project: Restricted Project. · View Herald TranscriptMar 1 2022, 5:44 PM
sgrechanik updated this revision to Diff 412296.Mar 1 2022, 5:46 PM
sgrechanik marked an inline comment as not done.
sgrechanik edited the summary of this revision. (Show Details)
  • Switch to generic form only when severity level is Error
sgrechanik updated this revision to Diff 412433.Mar 2 2022, 8:29 AM
  • Standard -> Func

@rriddle Please take another look, if you have any concerns I can revert to the last accepted state and extract the change into a separate patch.

Yeah, looks good. (Didn't mean to block, when I approved that meant I was okay with submitting/fixing the remaining comments before submitting).

Thanks, I'll merge it on Monday.

This revision was automatically updated to reflect the committed changes.