This is an archive of the discontinued LLVM Phabricator instance.

[mlir][StandardToSPIRV] Emulate bitwidths not supported for store op.
ClosedPublic

Authored by hanchung on May 1 2020, 3:56 PM.

Details

Summary

As D78974, this patch implements the emulation for store op. The emulation is
done with atomic operations. E.g., if the storing value is i8, rewrite the
StoreOp to:

  1. load a 32-bit integer
  2. clear 8 bits in the loading value
  3. store 32-bit value back
  4. load a 32-bit integer
  5. modify 8 bits in the loading value
  6. store 32-bit value back

The step 1 to step 3 are done by AtomicAnd as one atomic step, and the step 4
to step 6 are done by AtomicOr as another atomic step.

Diff Detail

Event Timeline

hanchung created this revision.May 1 2020, 3:56 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 1 2020, 3:56 PM
mravishankar requested changes to this revision.May 4 2020, 9:15 AM

@antiagainst PTAL at the suggestion here.

@hanchung : Thanks for porting this to MLIR core. This was what we had converged on in IREE where we were trying to avoid the expensive compare exchange. We tried this, but looking at it with fresh eyes this seems buggy.

mlir/lib/Conversion/StandardToSPIRV/ConvertStandardToSPIRV.cpp
731

Might be easier to just convert the memref element type using the type converter (and do the same at the loadOp) instead of this.

747–757

I think these steps will not give the result that you want in all cases. There will be a race between the other threads that are modifying the same word, but different bits in the word. I dont see a way around it apart from using a atomic compare exchange.

do {

%val = load %basePtr[%baseOffset] : !spv.ptr<i32, ..>
%clearbits = and %val, %mask : i32 // Clear the bits that need to be updated.
%updateval = or %clearbits, %update : i32 // Update the bits as required

} while( %val != atomicCompareExchange(%basePtr, %updateVal, %val));

See semantics of compareExchange here : https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#OpAtomicCompareExchange. If the condition of the while failed, it implies the compareExchange succeeded, i.e. the value at %basePtr was same as the %val that was loaded. If the condition failed, then the value at %basePtr was different from what was loaded indicating a stale load (due to an other write to the same word).

This revision now requires changes to proceed.May 4 2020, 9:15 AM
antiagainst accepted this revision.May 4 2020, 9:30 AM

Nice! Thanks Hanhan! I just have a few nits. Feel free to land after addressing them.

mlir/lib/Conversion/StandardToSPIRV/ConvertStandardToSPIRV.cpp
149

Not necessarily 32-bit here?

150

No need to require the whole StoreOpOperandAdaptor here given we just need op.value() in this function? Just passing in a Value? That way it creates cleaner interface.

304

The comment needs to be updated. :)

antiagainst marked an inline comment as done.May 4 2020, 9:42 AM

Sorry didn't see Mahesh's comment before commenting. :)

mlir/lib/Conversion/StandardToSPIRV/ConvertStandardToSPIRV.cpp
731

This was requested by me. Actually depending on the storage class, we might want different capabilities, which determines whether we can use 16-bit or not in that particular storage class. For example, 16-bit in storage buffer is fine if we have StorageBuffer16BitAccess, but 16-bit in push constant is not fine if we don't have StoragePushConstant16. So we need to go through the memref here.

747–757

I might be missing something but the approach here should be fine given each thread is actually only touching strictly disjoint portion of the word? Could you elaborate a bit here regarding the issue?

mravishankar added inline comments.May 4 2020, 10:40 AM
mlir/lib/Conversion/StandardToSPIRV/ConvertStandardToSPIRV.cpp
747–757

I dont think that is the case. Here is the proposed sequence of operations by this patch

load
atomicAnd
load
atomicOr.

Lets just focus on the last two instructions

load
atomicOr

between the load and the atomicOr some other thread might have written to the same word but at different bits. So the load value that is seen might be different than what exists in memory at the time of atomicOr. A thread can overwrite the entire word if it can guarantee that the entire word was same as it saw when it did the load. Otherwise its load value is stale. It has to abort and retry. Thats what the suggested snippet is doing. It ensures that when a thread updates the word, the value it saw at the load is not stale.

mravishankar accepted this revision.May 4 2020, 11:19 AM
mravishankar added inline comments.
mlir/lib/Conversion/StandardToSPIRV/ConvertStandardToSPIRV.cpp
747–757

Sorry, got confused by the comment. There is no intermediate load. Then this works. Apologies for the noise.

This revision is now accepted and ready to land.May 4 2020, 11:19 AM
antiagainst added inline comments.May 4 2020, 11:23 AM
mlir/lib/Conversion/StandardToSPIRV/ConvertStandardToSPIRV.cpp
747–757

Yup the comment can be made clearer by saying "... to step 3 are done by spv.AtomicAnd as one atomic step, ... by spv.AtomicOr as another atomic step." Hanhan could you update here?

hanchung updated this revision to Diff 261943.May 4 2020, 3:09 PM
hanchung marked 9 inline comments as done.

Address comments

hanchung edited the summary of this revision. (Show Details)May 4 2020, 3:10 PM
This revision was automatically updated to reflect the committed changes.