This patch fixes the crash when printing some ops (like affine.for and
scf.for) when they are dumped in invalid state, e.g. during pattern
application. Now the AsmState constructor verifies the operation
first and switches to generic operation printing when the verification
fails. Also operations are now printed in generic form when emitting
diagnostics and the severity level is Error.
Details
Diff Detail
Event Timeline
We shouldn't be baking in verification checking into the custom printers. If anything we should have a dump/printer API that falls back to generic in the case of verification failure. It isn't feasible/maintainable to re-encode all of the verifier constraints, so we should keep custom printers clean from things they should be able to assume.
We can call opInfo->verifyInvariants(op) in OperationPrinter::printOperation and switch to generic printing on failure, something along the lines:
if (auto opInfo = op->getRegisteredInfo()) { // First verify the operation. If the operation fails to verify then the // custom print function may fail and we have to use the generic one. { // Ignore errors emitted by the verifier. ScopedDiagnosticHandler diagHandler( op->getContext(), [](Diagnostic &) { return success(); }); if (failed(opInfo->verifyInvariants(op))) { printGenericOp(op, /*printOpName=*/true); return; } } opInfo->printAssembly(op, *this, defaultDialectStack.back()); return; }
However, doing this exposes some bugs in other ops. I'll try to investigate.
I talked about this with River in the past: I'd call the verifier on the entry point of the printer (the top-level operation to print), which will verify recursively the IR down. And switch to generically printing for the whole IR and not on an op-by-op basis.
Not sure at which level to put the verification, Operation::dump seems to be a good candidate, it will fix the original crash and won't affect other kinds of printing.
mlir/test/Conversion/AffineToStandard/lower-affine.mlir | ||
---|---|---|
1–3 | LG, except for these. I'm not sure we should be adding these to the tests. Realistically tests shouldn't be crashing with -debug, but we shouldn't need to add debug individually to all of the tests to verify that. |
mlir/lib/IR/AsmPrinter.cpp | ||
---|---|---|
2914 | I think this should belong to the printer and not just "dump()" The behavior does not seem desirable to be coupled to the llvm::errs() in itself. We don't want to crash when printing in general. |
Fixing this not only for dump uncovered a bug in verification of spirv::ConstantOp, which should be fixed first: https://reviews.llvm.org/D118939
mlir/lib/IR/AsmPrinter.cpp | ||
---|---|---|
2909 | There is (I believe) still a change that an error can be emitted here in multi threaded environment, it would require a new handler to be set after this but before verify below. Not a high probability event though. |
- Moved the check to Operation::print and Block::print
- Removed error suppression: errors may be helpful.
mlir/lib/IR/AsmPrinter.cpp | ||
---|---|---|
2909 | Hmm, yes, you are right, and I don't immediately see how to solve it. But maybe actually we shouldn't try to suppress error messages here, because without error messages it's hard to tell why the printer switched to generic op printing. | |
2914 | Moved it to Operation::print and Block::print | |
mlir/test/Conversion/AffineToStandard/lower-affine.mlir | ||
1–3 | Do you have a better idea how to tests this? |
mlir/lib/IR/AsmPrinter.cpp | ||
---|---|---|
2894 | Please run git clang-format on your commit (arc should do it for you if clang-format is in the path I think) | |
2909 | I wouldn't expect diagnostic about verification failure in a place where I call "print" though. River: is there a safe way to silence these in the context of the printer? | |
2925 | Is this check needed/useful? Also you checked in the Operation::print is the generic printer flag isn't already set before verifying, should you do the same here? |
mlir/lib/IR/AsmPrinter.cpp | ||
---|---|---|
2909 | Generally if you only want diagnostics for this thread, you should check the result of llvm::get_threadid() within the handler against the one for the main thread. Something like: auto parentThreadId = llvm::get_threadid(); auto handlerFn = [&](Diagnostic &diag) { return success(parentThreadId == llvm::get_threadid()); }; |
- Error suppression is back
- fixed formatting errors
mlir/lib/IR/AsmPrinter.cpp | ||
---|---|---|
2909 | Thanks, I'll use that. | |
2925 | I inserted this check to avoid messing with diagnostic handling if the block is empty. Here the flags are always the default ones. AsmState contains its own set of flags, but as far as I understand, they are not used for the actual printing. However, I wonder whether AsmState and OperationPrinter flags mismatch would cause any trouble. Maybe we shouldn't do the verification in functions taking AsmState, and verify in more higher level Operation::print(raw_ostream &os, const OpPrintingFlags &printerFlags) and Block::print(raw_ostream &os) instead? |
mlir/lib/IR/AsmPrinter.cpp | ||
---|---|---|
2925 | It's generally too late to do verification if the AsmState already exists, the construction of that may call back into operations (and thus crash if invalid). I'd just verify in the non-AsmState functions. Tangentially though, we should just expose the OpPrintingFlags held by the AsmState and use those (same applies for the other function). |
mlir/lib/IR/AsmPrinter.cpp | ||
---|---|---|
2925 |
Then it probably makes sense to do that and perform verification in the AsmState constructor. It doesn't seem to be a lot of code change. |
Ok, let's do it gradually and expose printing flags in AsmState first: https://reviews.llvm.org/D119870
At the python level, we have a flag to assume verified and not reverify. When using systematically, there are often cases where you know verification is successful and can skip doing it again.
We can add that in after this patch if needed, but I feel we should do so.
mlir/lib/IR/AsmPrinter.cpp | ||
---|---|---|
1216 | Shouldn't the verifier failure be printing in generic mode already which should skip the verifier? |
mlir/lib/IR/AsmPrinter.cpp | ||
---|---|---|
1216 | Not always. When the error is emitted like op1.emitError() << op2; then op1 will be printed in generic mode (or not at all), and op2 will be printed with default flags. This actually happens when verifying whether a block has a terminator in mlir/lib/IR/Verifier.cpp:154 Operation &terminator = block.back(); if (!terminator.mightHaveTrait<OpTrait::IsTerminator>()) return block.back().emitError("block with no terminator, has ") << terminator; And also printing an op may require verification of its parent op because AsmState may be constructed for the parent op. |
mlir/lib/IR/AsmPrinter.cpp | ||
---|---|---|
1216 |
But isn't this something we should fix in the verifier? Seems like unintended to me.
Similarly: if we print an op with the generic flag, then constructing the AsmState for the parent op could just inherit the flags again? In general I'm wary of mutable global state when we can do differently: it's not clear to me still that this isn't just a workaround instead of something that is fundamentally needed if the system is consistent. |
mlir/lib/IR/AsmPrinter.cpp | ||
---|---|---|
1216 | I think we're always gonna hit custom verifier that may print op without propagating flags, I can't see a better option than what you have here right now, so LGTM for now. | |
mlir/test/Conversion/AffineToStandard/lower-affine.mlir | ||
1–3 | I agree with River: we should keep the testing focused. what about adding a test with invalid IR and check that the printed op is in generic form? |
mlir/lib/IR/AsmPrinter.cpp | ||
---|---|---|
1216 | We should avoid any kind of global or thread_local state. We should just switch how we handle printing operations in diagnostics, e.g. to just use generic form by default (i.e. change this https://github.com/llvm/llvm-project/blob/8e7995884a6525b44fe2f71318883ba2fec2d972/mlir/lib/IR/Diagnostics.cpp#L126). |
mlir/lib/IR/AsmPrinter.cpp | ||
---|---|---|
1216 | We can switch to generic form by default but that will somewhat worsen user experience by making error messages less pretty. I'm ok with it though. Alternatively we can implement per-thread error suppression in DiagnosticEngine, but that would be a much bigger change. |
- Switched to generic printing in diagnostics, removed the thread_local trick
- Added a flag to skip verification
- Implemented more explicit testing of invalid op printing
Nice.
mlir/test/Analysis/test-match-reduction.mlir | ||
---|---|---|
10 ↗ | (On Diff #411513) | If we wanted, we could allow remarks to use the proper custom format. We would need to add some documentation somewhere about what types of diagnostics are expected from the verifier though. |
mlir/test/Analysis/test-match-reduction.mlir | ||
---|---|---|
10 ↗ | (On Diff #411513) | Makes sense, may be worth doing it in this review to avoid changing tests back and forth. |
@rriddle Please take another look, if you have any concerns I can revert to the last accepted state and extract the change into a separate patch.
Yeah, looks good. (Didn't mean to block, when I approved that meant I was okay with submitting/fixing the remaining comments before submitting).
Shouldn't the verifier failure be printing in generic mode already which should skip the verifier?
I'm not sure why we need this?