This is an archive of the discontinued LLVM Phabricator instance.

[LangRef] Clarify load/store of non-byte-sized types
Changes PlannedPublic

Authored by nikic on Apr 19 2022, 2:19 AM.

Details

Summary

For non-byte-sized types like i20, we currently specify that:

When loading a value of a type like i20 with a size that is not an integral number of bytes, the result is undefined if the value was not originally written using a store of the same type.

When writing a value of a type like i20 with a size that is not an integral number of bytes, it is unspecified what happens to the extra bits that do not belong to the type, but they will typically be overwritten.

These semantics are somewhat unclear, and likely incompatible with current optimization assumptions. If "undefined" in load specification is in the sense of "undefined behavior" that means that non-byte-sized loads may not be speculatable despite the memory being dereferenceable, which is not something we're taking into account right now. The "unspecified" in the store specification is just unclear, does this mean the bits become undef, or poison, or something else?

This patch specifies that poison is stored for padding of both aggregates and non-byte-sized types. Loads of such types don't require special treatment, as the padding is not observable anyway.

I believe this covers the practical requirements for non-byte-sized types, in that a) a hardware store will likely zero out the padding bits, which is compatible with poison and b) it is possible to eliminate a store of non-byte-sized type (e.g. due to "store of loaded value"), which will leave behind the original bits (which might be poison, so making the padding anything weaker than that would render the optimization illegal).

Diff Detail

Event Timeline

nikic created this revision.Apr 19 2022, 2:19 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 19 2022, 2:19 AM
Herald added a subscriber: jdoerfert. · View Herald Transcript
nikic requested review of this revision.Apr 19 2022, 2:19 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 19 2022, 2:19 AM

For loads, the default lowering in the backends assume that the padding bits are 0. Which is only possible if the store explicitly wrote those padding bits to 0. The default lowering of stores does that if it was stored with the same type. I know we've had bugs in X86 related to this. I'll see if I can find any.

nikic planned changes to this revision.Apr 19 2022, 2:43 PM

For loads, the default lowering in the backends assume that the padding bits are 0. Which is only possible if the store explicitly wrote those padding bits to 0. The default lowering of stores does that if it was stored with the same type. I know we've had bugs in X86 related to this. I'll see if I can find any.

Oh wow, you're right: https://llvm.godbolt.org/z/nf3898b89 These semantics aren't going to work then.

https://reviews.llvm.org/D91294 is one bug case I was able to find.

I confirmed that there are a few other architectures assuming the same semantics:

To support code generation to these architecture, storing an integer of a non-byte-sized type must fill the padding with zero.

Well, here we are discussing the semantics of LLVM IR. It's ok for the semantics of SDAG to be different, as long as it's a refinement of LLVM IR's.
Padding at IR level is not observable normally. So we can define it as poison.

The only way to observe padding is through store/load of different types. But we specify that the location of padding is non-deterministic in the case of vectors. I don't see a reason for padding in integer types to be different.

The SDAG can happily assume that the padding is zero because stores at SDAG level enforce that. But we don't need to offer that semantics at IR level.

The only advantage of defining padding as zero would be, as the examples show, fold load+zext into a larger load. But for this to work we would need to define the memory layout, and I don't know if that's consistent across backends. And is it worth it?

nikic added a comment.Apr 21 2022, 5:24 AM

Well, here we are discussing the semantics of LLVM IR. It's ok for the semantics of SDAG to be different, as long as it's a refinement of LLVM IR's.
Padding at IR level is not observable normally. So we can define it as poison.

The only way to observe padding is through store/load of different types. But we specify that the location of padding is non-deterministic in the case of vectors. I don't see a reason for padding in integer types to be different.

The SDAG can happily assume that the padding is zero because stores at SDAG level enforce that. But we don't need to offer that semantics at IR level.

The only advantage of defining padding as zero would be, as the examples show, fold load+zext into a larger load. But for this to work we would need to define the memory layout, and I don't know if that's consistent across backends. And is it worth it?

I don't think the SDAG semantics would be a refinement of the IR semantics though. In SDAG, a non-byte-size load performs an implicit assert that the padding bits are zero, while the IR semantics (as specified here) would allow that padding to be poison. Presumably this is where the "the result is undefined if the value was not originally written using a store of the same type" wording that LangRef uses right now comes from, as that allows making a consistent choice between load and store.

Well, here we are discussing the semantics of LLVM IR. It's ok for the semantics of SDAG to be different, as long as it's a refinement of LLVM IR's.
Padding at IR level is not observable normally. So we can define it as poison.

The only way to observe padding is through store/load of different types. But we specify that the location of padding is non-deterministic in the case of vectors. I don't see a reason for padding in integer types to be different.

The SDAG can happily assume that the padding is zero because stores at SDAG level enforce that. But we don't need to offer that semantics at IR level.

The only advantage of defining padding as zero would be, as the examples show, fold load+zext into a larger load. But for this to work we would need to define the memory layout, and I don't know if that's consistent across backends. And is it worth it?

I don't think the SDAG semantics would be a refinement of the IR semantics though. In SDAG, a non-byte-size load performs an implicit assert that the padding bits are zero, while the IR semantics (as specified here) would allow that padding to be poison. Presumably this is where the "the result is undefined if the value was not originally written using a store of the same type" wording that LangRef uses right now comes from, as that allows making a consistent choice between load and store.

But the SDAG stores will pad with zeros. So any load will see those zeros.
The SDAG semantics are a refinement, They do the optimization of assuming that padding bits are zero because of its store semantics. So I think we are fine.

Well, here we are discussing the semantics of LLVM IR. It's ok for the semantics of SDAG to be different, as long as it's a refinement of LLVM IR's.

I think any valid compilation of LLVM IR should be considered here.
There might be another compiler for LLVM IR that does not use SDAG internally. If padding is poison, it is valid for the other compiler to lower padding to non-zero bits, which will invalidate the transformations in various machines mentioned above.

nikic added a comment.Apr 22 2022, 2:57 AM

Well, here we are discussing the semantics of LLVM IR. It's ok for the semantics of SDAG to be different, as long as it's a refinement of LLVM IR's.

I think any valid compilation of LLVM IR should be considered here.
There might be another compiler for LLVM IR that does not use SDAG internally. If padding is poison, it is valid for the other compiler to lower padding to non-zero bits, which will invalidate the transformations in various machines mentioned above.

Even with just the one compiler, if we take something like this:

define i8 @test(ptr %p) {
  store i8 -1, ptr %p
  %v = load i1, ptr %p
  %r = zext i1 %v to i8
  ret i8 %r
}

then with basic "padding is poison" semantics, this function must return i8 1, while the SDAG lowering would return i8 -1 (https://llvm.godbolt.org/z/bMEYnaqah with an optimization barrier to prevent folding).

I think to satisfy all requirements while keeping the current lowering, we'd need to do something like this:

Loads: When loading a non-byte-sized type, the result is poison if the padding bits do not have the value that a store of the same type would produce.

Stores: When storing a non-byte-sized type, if the stored value is poison then the padding bits are set to poison, otherwise the padding bits are set to a target-specific value.

The convoluted wording is to make sure that dropping the load/store in the following example is legal:

define void @src(ptr %p) {
  %v = load i1, ptr %p
  store i1 %v, ptr %p
  ret void
}

Here, if %p currently has correct padding, then this is clearly a no-op. If it has incorrect padding, then the load returns poison, and then the store writes a poison byte, which can be refined to the current value at the location.

Though I am wondering if we wouldn't be better off with actually changing the backend lowering in the interest of having sane IR semantics. I.e. the backend would not assume any specific value for the padding bits of load i1 and would have to introduce a masking operation. I don't think performance for these is particularly important, as frontends shouldn't be introducing such accesses anyway (at least clang and rustc will both use i8 as the in-memory type for i1).

As for i1, you have seen enum BooleanContent, correct?

Well, here we are discussing the semantics of LLVM IR. It's ok for the semantics of SDAG to be different, as long as it's a refinement of LLVM IR's.

I think any valid compilation of LLVM IR should be considered here.
There might be another compiler for LLVM IR that does not use SDAG internally. If padding is poison, it is valid for the other compiler to lower padding to non-zero bits, which will invalidate the transformations in various machines mentioned above.

That's fine, as long as you don't mix compiled files between those 2 compilers. But that's about the ABI, not the semantics of LLVM IR. Per Roman's comment, the layout is target specific, so we cannot define at the IR level.
Load/store with different types is not well-defined, so padding cannot be observed at IR level (usually). The only concern is load widening, but that's fixed with the byte type, and padding doesn't make the problem worse.

define i8 @test(ptr %p) {
  store i8 -1, ptr %p
  %v = load i1, ptr %p
  %r = zext i1 %v to i8
  ret i8 %r
}

I think we consider this example invalid as the store/load pair has different types.

As for i1, you have seen enum BooleanContent, correct?

Ah nice. That shows that the memory layout for i1 is target-specific.
It's easier if we define the padding as poison + load/store type mismatch as poison as well.

As for i1, you have seen enum BooleanContent, correct?

BooleanContent is not involved in load/store handling for i1 types. See these comments in LegalizeDAG.cpp

// Promote to a byte-sized store with upper bits zero if not                 
// storing an integral number of bytes.  For example, promote                
// TRUNCSTORE:i1 X -> TRUNCSTORE:i8 (and X, 1)
// The extra bits are guaranteed to be zero, since we stored them that       
// way.  A zext load from NVT thus automatically gives zext from SrcVT.
arsenm added a subscriber: arsenm.Apr 24 2022, 6:57 PM

As for i1, you have seen enum BooleanContent, correct?

BooleanContent is not involved in load/store handling for i1 types. See these comments in LegalizeDAG.cpp

BooleanContent is only used in boolean contexts, not that we have a definition for what that means. It's used only for compare results and select/branch conditions

lebedev.ri resigned from this revision.Jan 12 2023, 4:58 PM

This review may be stuck/dead, consider abandoning if no longer relevant.
Removing myself as reviewer in attempt to clean dashboard.