This is an archive of the discontinued LLVM Phabricator instance.

[mlir][tensor] Improve verifiers: detect out-of-bounds accesses
Changes PlannedPublic

Authored by springerm on Jul 23 2023, 10:49 AM.

Details

Summary
  • Disallow slices that run out of bounds.
  • Disallow negative indices/offsets.
  • Fix invalid test cases.

Diff Detail

Event Timeline

springerm created this revision.Jul 23 2023, 10:49 AM
Herald added a project: Restricted Project. · View Herald Transcript
springerm requested review of this revision.Jul 23 2023, 10:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 23 2023, 10:49 AM

I'm going to remove verification for offsets/sizes that are SSA values. According to the developer guide we should only verify static_sizes and static_offsets.

springerm updated this revision to Diff 543578.Jul 24 2023, 9:06 AM

address comments

springerm edited the summary of this revision. (Show Details)Jul 24 2023, 11:02 AM
springerm edited the summary of this revision. (Show Details)

udpate

springerm edited the summary of this revision. (Show Details)Jul 24 2023, 11:03 AM

I'm going to remove verification for offsets/sizes that are SSA values. According to the developer guide we should only verify static_sizes and static_offsets.

I think static sizes/offsets can be generated from ssa ones by existing folders/canonicalization, so there are really no difference. IMO, we shouldn't fail verification on this as they can appear as result of valid transformations (e.g. in unreachable branches).

nicolasvasilache added a comment.EditedJul 25 2023, 9:01 PM

Clearly this caught some UB cases and has value as a sanity check we would want to run manually.
However it does go against the principles outlined in the link posted by @jpienaar: turning UB into a verification error is a no-go as it breaks local invariants that need to hold.

You could have a dedicated pass which detects such UB, it would be similar to -test-memref-bound-check.
But I am afraid we cannot do significantly better as other invariants would break badly.

I think static sizes/offsets can be generated from ssa ones by existing folders/canonicalization, so there are really no difference.

those would have to be updated to only fold if the constant value is valid

I think static sizes/offsets can be generated from ssa ones by existing folders/canonicalization, so there are really no difference.

those would have to be updated to only fold if the constant value is valid

I think it would be OK to generate invalid ops during folding. (This can happen anyway, even if we do more aggressive checking of SSA values, because the verifier cannot take into account any possible foldings/canonicalizations that could happen.) garbage in => garbage out

We can check do extra runtime checks for SSA value offsets/sizes/strides as part of -generate-runtime-verification.

I think static sizes/offsets can be generated from ssa ones by existing folders/canonicalization, so there are really no difference.

those would have to be updated to only fold if the constant value is valid

I think it would be OK to generate invalid ops during folding. (This can happen anyway, even if we do more aggressive checking of SSA values, because the verifier cannot take into account any possible foldings/canonicalizations that could happen.) garbage in => garbage out

I thought one of the main points of https://mlir.llvm.org/getting_started/DeveloperGuide/#ir-verifier was to avoid this? if verifiers only verify local constraints, folders can check that they don't violate them.

I thought one of the main points of https://mlir.llvm.org/getting_started/DeveloperGuide/#ir-verifier was to avoid this? if verifiers only verify local constraints, folders can check that they don't violate them.

My thinking was: A folder/canonicalization may generate an op with invalid static offsets/sizes/strides (e.g., by promoting SSA values to attributes) and then next time the verifier is running, it will be caught (hopefully). Alternatively, the folder/canonicalization pattern could also just leave such ops alone that would be canonicalized/folded to ops that would not pass the verifier. (That would make those folders/patterns more complex and we would have to duplicate (or call) some verification code there.)

Also just to clarify: The purpose of this change is to clarify in the documentation that out-of-bounds accesses are invalid and add explicit verifiers for that. I think we already treated such ops as "invalid", it was not mentioned in the documentation. But such ops do not lower correctly and result in invalid IR during vectorization/bufferization.

Consider following program

%0 = ... tensor<5xf32>
%1 = tensor.cast %0 tensor<5xf32> to tensor<?xf32>
%dim = dim %1 0 : tensor<?xf32>
%cond = <nontrivial calculation depending on %dim>
%res = scf.if %cond {
  %2 = tensor.extract %1 [42] : tensor<?xf32>
  yield %2
} else {
  yield %default
}

which can be canonicalized to

%0 = ... tensor<5xf32>
%dim = dim %0 0 : tensor<?xf32>
%cond = <nontrivial calculation depending on %dim>
%res = scf.if %cond {
  %2 = tensor.extract %0 [42] : tensor<5xf32>
  yield %2
} else {
  yield %default
}

Which is still a valid program, but now will be wrongfully rejected by verifier
Moreover, when we detected invalid indices, it can be transformed further to

%0 = ... tensor<5xf32>
%dim = dim %0 0 : tensor<?xf32>
%cond = <nontrivial calculation depending on %dim>
%res = scf.if %cond {
  %2 = poison
  yield %2
} else {
  yield %default
}

Consider following program

This is an interesting example. I'm not familiar with "poison". What does this mean for op verification? Does this mean that we should never verify anything? Can the same thing happen with things that the ExtractSliceOp/... verifiers are already checking at the moment?

kuhar added a subscriber: kuhar.Jul 26 2023, 12:48 PM

Which is still a valid program, but now will be wrongfully rejected by verifier

Consider following program

This is an interesting example. I'm not familiar with "poison". What does this mean for op verification? Does this mean that we should never verify anything? Can the same thing happen with things that the ExtractSliceOp/... verifiers are already checking at the moment?

-1, I would not call the transformed program valid. Local properties like non-ssa OOB indices should make ops invalid and fail in the verifier. And separately, any canonicalization/folds that would create invalid ops should instead either abort folding or replace them with poison/unreachable.

kuhar accepted this revision.Jul 27 2023, 10:45 AM

LGTM % nits

mlir/include/mlir/Dialect/Tensor/IR/TensorOps.td
266

Did you want to highlight some difference by using 'should' and 'must'? If there is none, I'd prefer to use 'must' everywhere

739

same here

mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
182–185
This revision is now accepted and ready to land.Jul 27 2023, 10:45 AM
springerm planned changes to this revision.Jul 27 2023, 10:50 AM

I’m going to add a check to the dynamic->static offset/sizes canonicalization so that no ops would be generated that are invalid

Hardcode84 added a comment.EditedJul 27 2023, 11:17 AM

Which is still a valid program, but now will be wrongfully rejected by verifier

Consider following program

This is an interesting example. I'm not familiar with "poison". What does this mean for op verification? Does this mean that we should never verify anything? Can the same thing happen with things that the ExtractSliceOp/... verifiers are already checking at the moment?

-1, I would not call the transformed program valid. Local properties like non-ssa OOB indices should make ops invalid and fail in the verifier. And separately, any canonicalization/folds that would create invalid ops should instead either abort folding or replace them with poison/unreachable.

While idea of skipping invalid states entirely will work in principle, IMO it is much less composable. Not only we will need to add checks into indices canonicalizers, but also to cast canonicalizers, and, probably, to other places we are not even aware of yet.