This is an archive of the discontinued LLVM Phabricator instance.

[SDAG] Use UnknownSize for masked load/store MMO size
ClosedPublic

Authored by dmgreen on Nov 15 2021, 5:37 AM.

Details

Summary

A masked load or store will load a potentially unknown size of bytes from a memory location - that is not generally known at compile time. They do not necessarily load/store the entire vector width, and treating them as such can lead to incorrect aliasing information (for example, if the underlying object is smaller than the size of the vector).

This makes sure that the MMO is given an unknown size to represent this. which is less accurate that "may load/store from up to 16 bytes", but less incorrect that "will load/store from 16 bytes".

Diff Detail

Event Timeline

dmgreen created this revision.Nov 15 2021, 5:37 AM
dmgreen requested review of this revision.Nov 15 2021, 5:37 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 15 2021, 5:37 AM

That would be good! Unfortunately the size is really stored as a LLT, which means we shouldn't be passing MemoryLocation::UnknownSize but can't use a LocationSize either.
https://github.com/llvm/llvm-project/blob/95102b7dc3c1b5b3f1b688221d9aa28cb1e17974/llvm/include/llvm/CodeGen/MachineFunction.h#L937
This should be creating an invalid LLT, I'll change that. The LLT gets used in the GlobalISel lowering a fair amount it seems.

dmgreen updated this revision to Diff 387257.Nov 15 2021, 7:36 AM

Create an invalid LLT using LLT().

They do not necessarily load/store the entire vector width, and treating them as such can lead to incorrect aliasing information (for example, if the underlying object is smaller than the size of the vector).

Does this mean any operation that's a mayload/maystore needs to have the size of the operand set to "unknown"? I suspect we need to do significantly more work than just this patch if we want memory operands to work this way. (e.g. type legalization, if conversion.) What happens if we just say that size of a memory location is supposed to represent an upper bound, at least for now?

we shouldn't be passing MemoryLocation::UnknownSize

If that's true, maybe we need to assert this? Apparently we are actually passing in MemoryLocation::UnknownSize in some places.

Does this mean any operation that's a mayload/maystore needs to have the size of the operand set to "unknown"? I suspect we need to do significantly more work than just this patch if we want memory operands to work this way. (e.g. type legalization, if conversion.) What happens if we just say that size of a memory location is supposed to represent an upper bound, at least for now?

Why would it be any operation, not just masked loads/stores? The problem is in this case the underlying object only had 10 dereferenceable chars, where as the entire vector width is 16bytes. I don't think that would apply to any normal load/store - it would already be UB at the llvm-ir level.

It's the equivalent of this code, but there is unfortunately no current way of adding MemoryLocations directly, it goes via a LLT size. Maybe we do want to change that, but I didn't want to try and do it here. It looks quite involved.
https://github.com/llvm/llvm-project/blob/913d78c40c37c9c3428285d868ce454b058e40f3/llvm/lib/Analysis/MemoryLocation.cpp#L165

The exact thing going wrong here is that this code in the scheduler is adding chain dependencies:
https://github.com/llvm/llvm-project/blob/913d78c40c37c9c3428285d868ce454b058e40f3/llvm/lib/CodeGen/ScheduleDAGInstrs.cpp#L942
Which checks this AA check, which returns false, I think from basic-aa:
https://github.com/llvm/llvm-project/blob/913d78c40c37c9c3428285d868ce454b058e40f3/llvm/lib/CodeGen/ScheduleDAGInstrs.cpp#L543
So no scheduling dependencies between the instructions and we end up in the wrong order. This started happening after we moved the VPT block pass later so predicated MVE instructions are no long in bundles during the post-ra scheduler.

we shouldn't be passing MemoryLocation::UnknownSize

If that's true, maybe we need to assert this? Apparently we are actually passing in MemoryLocation::UnknownSize in some places.

Sure that sounds good, I can take a look. It might be simpler to make sute the relevant methods work sensibly with MemoryLocation::UnknownSize.

Does this mean any operation that's a mayload/maystore needs to have the size of the operand set to "unknown"? I suspect we need to do significantly more work than just this patch if we want memory operands to work this way. (e.g. type legalization, if conversion.) What happens if we just say that size of a memory location is supposed to represent an upper bound, at least for now?

Why would it be any operation, not just masked loads/stores? The problem is in this case the underlying object only had 10 dereferenceable chars, where as the entire vector width is 16bytes. I don't think that would apply to any normal load/store - it would already be UB at the llvm-ir level.

It's basically "masked" load/store operations. At the IR level, those are rare, sure. At the machine code level, there are operations other than llvm.masked.load that end up masked, though. For example, the ARM instruction "strne". Not sure how many places can end up with instructions like that off the top of my head.

The type legalization for llvm.masked.load also has the same issue as SelectionDAGBuilder.

It's basically "masked" load/store operations. At the IR level, those are rare, sure. At the machine code level, there are operations other than llvm.masked.load that end up masked, though. For example, the ARM instruction "strne". Not sure how many places can end up with instructions like that off the top of my head.

I don't believe that is how it works though, not exactly. In this case the backend is constructing a query of the form "does (%1, 16) alias (%arrayidx4, 1)". And because the underlying object of %arrayidx4 is only 10 bytes large, the answer is no as the alternative would be UB. But we are using the aliasing information at the IR level, not what the mir has become. If the mir has been altered to the point that the llvm-lr level aliasing info no longer applies then yes that would be invalid, but I haven't seen that happen anywhere. A predicated store (strne) still uses the llvm-ir level aliasing info which will be valid if the original store was in an if block (or however it was predicated). And so long as that aliasing query remains valid at the ir level, the size of the predicated store's memory location should be OK.

At least - I don't have an example of that going wrong. If you do know of a way to make it act incorrectly, let me know.

The type legalization for llvm.masked.load also has the same issue as SelectionDAGBuilder.

What do you mean by "type legalization" here exactly? I don't think global-isel handles masked loads/stores yet, and most of the illegal masked operation lowering is performed pre-isel. I wasn't sure what other type legalization you were referring to.

dmgreen updated this revision to Diff 388158.Nov 18 2021, 4:51 AM

Updated getMachineMemOperand overload to handle -1 sizes correctly.

It's basically "masked" load/store operations. At the IR level, those are rare, sure. At the machine code level, there are operations other than llvm.masked.load that end up masked, though. For example, the ARM instruction "strne". Not sure how many places can end up with instructions like that off the top of my head.

I don't believe that is how it works though, not exactly. In this case the backend is constructing a query of the form "does (%1, 16) alias (%arrayidx4, 1)". And because the underlying object of %arrayidx4 is only 10 bytes large, the answer is no as the alternative would be UB. But we are using the aliasing information at the IR level, not what the mir has become. If the mir has been altered to the point that the llvm-lr level aliasing info no longer applies then yes that would be invalid, but I haven't seen that happen anywhere. A predicated store (strne) still uses the llvm-ir level aliasing info which will be valid if the original store was in an if block (or however it was predicated). And so long as that aliasing query remains valid at the ir level, the size of the predicated store's memory location should be OK.

At least - I don't have an example of that going wrong. If you do know of a way to make it act incorrectly, let me know.

The type legalization for llvm.masked.load also has the same issue as SelectionDAGBuilder.

What do you mean by "type legalization" here exactly? I don't think global-isel handles masked loads/stores yet, and most of the illegal masked operation lowering is performed pre-isel. I wasn't sure what other type legalization you were referring to.

I believe Eli was referring to functions like DAGTypeLegalizer::SplitVecRes_MLOAD and DAGTypeLegalizer::SplitVecOp_MSTORE.

I believe Eli was referring to functions like DAGTypeLegalizer::SplitVecRes_MLOAD and DAGTypeLegalizer::SplitVecOp_MSTORE.

I see. Because it's recreating a MMO with the new widths of the vector, not using the existing MMO. Thanks.

dmgreen updated this revision to Diff 388913.Nov 22 2021, 7:32 AM
dmgreen added a reviewer: kparzysz.

Update DAGTypeLegalizer::SplitVecRes_MLOAD, DAGTypeLegalizer::SplitVecOp_MSTORE and HexagonTargetLowering::SplitHvxMemOp, which now shows some other differences in other tests. The other uses of getMaskedLoad/store looked OK from what I could see.

This revision is now accepted and ready to land.Nov 22 2021, 12:57 PM
This revision was landed with ongoing or failed builds.Nov 23 2021, 1:48 AM
This revision was automatically updated to reflect the committed changes.
yubing added a subscriber: yubing.Nov 25 2021, 12:01 AM

Hi, @dmgreen With this patch, the llc command crashed. Would you take a look?
llc -mcpu=core-avx2 main.ll

define void @main() unnamed_addr #0 {
entry:
  %P.i150.i.i = alloca [3 x [3 x double]], align 16
  %0 = bitcast [3 x [3 x double]]* %P.i150.i.i to <8 x double>*
  call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> zeroinitializer, <8 x double>* %0, i32 8, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false>)
  ret void
}

; Function Attrs: argmemonly nofree nosync nounwind willreturn writeonly
declare void @llvm.masked.store.v8f64.p0v8f64(<8 x double>, <8 x double>*, i32 immarg, <8 x i1>)
dmgreen added a comment.EditedNov 25 2021, 1:54 AM

Hi, @dmgreen With this patch, the llc command crashed. Would you take a look?
llc -mcpu=core-avx2 main.ll

define void @main() unnamed_addr #0 {
entry:
  %P.i150.i.i = alloca [3 x [3 x double]], align 16
  %0 = bitcast [3 x [3 x double]]* %P.i150.i.i to <8 x double>*
  call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> zeroinitializer, <8 x double>* %0, i32 8, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false>)
  ret void
}

; Function Attrs: argmemonly nofree nosync nounwind willreturn writeonly
declare void @llvm.masked.store.v8f64.p0v8f64(<8 x double>, <8 x double>*, i32 immarg, <8 x i1>)

Thanks. Seems that a masked store is turned back into an (unmasked) store, still with an unknown size. The assert seems superfluous, but I will try and fix the size when we promote the masked store to a store too. Thanks for the simple reproducer.